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Preface 


The Alpha System Reference Manual is divided into three Parts, four appendixes, and two indexes. 


Each part or section of a part describes a major portion of the Alpha architecture. Each contains its 
own Table of Contents and Index. Additional sections will be incorporated as development proceeds 


on the architecture. 


The following table outlines the contents of the manual: 


Name 


Part One 


Part Two 


Part Three 


Appendixes 


Symbol 


(1) 


di) 


(III) 


Contents 


Common Architecture 
This part describes the architecture that is common to and 
required by all implementations. 


Specific Operating System PALcode Architecture _ 
This part contains sections that describe how the following oper- 
ating systems relate to the Alpha architecture: 


Section Name and Contents Symbol 


OpenVMS Alpha Software (II-A) 
DIGITAL UNIX Software (II-B) 
Windows NT Alpha Software (I-C) 


Console Interface Architecture 
This part describes an architected console firmware implementa- 
tion. 


Because information in the appendixes can be shared by more 
than one section, appendixes are grouped together at the end of 
the manual. 
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ill 


Name 


Indexes 


Symbol Contents 


The index at the end of the manual is structured like a master 
index. Index entries are called out by the appropriate symbol: (1), 
(II), and so forth, associated with the corresponding part or sec- 
tion. Index entries for the appendixes are called out by appendix 
name and page number. 


Following the index for the entire manual is an index of the 
instructions. The instruction index is the easiest way to find pri- 
mary documentation for the Alpha instruction set and the PAL- 
code instructions for each operating system. 
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Common Architecture (I) 
This part describes the common Alpha architecture and contains the following: 


¢ Chapter 1, Introduction (I) 

e Chapter 2, Basic Architecture (1) 

e Chapter 3, Instruction Formats (I) 

¢ Chapter 4, Instruction Descriptions (1) 

¢ Chapter 5, System Architecture and Programming Implications (I) 
¢ Chapter 6, Common PALcode Architecture (I) 

e Chapter 7, Console Subsystem Overview (I) 

¢ Chapter 8, Input/Output Overview (I) 
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Chapter 1 


Introduction (1) 


Alpha is a 64-bit load/store RISC architecture that is designed with particular emphasis on the 
three elements that most affect performance: clock speed, multiple instruction issue, and multi- 
ple processors. " 


The Alpha architects examined and analyzed current and theoretical RISC architecture design 
elements and developed high-performance alternatives for the Alpha architecture. The archi- 
tects adopted only those design elements that appeared valuable for a projected 25-year design 
horizon. Thus, Alpha becomes the first 21st century computer architecture. 


The Alpha architecture is designed to avoid bias toward any particular operating system or pro- 
gramming language. Alpha supports the OpenVMS Alpha, DIGITAL UNIX, and Windows 
NT Alpha operating systems and supports simple software migration for applications that run 
on those operating systems. 


This manual describes in detail how Alpha is designed to be the leadership 64-bit architecture 
of the computer industry. 


1.1 The Alpha Approach to RISC Architecture 


Alpha Is a True 64-Bit Architecture 


Alpha was designed as a 64-bit architecture. All registers are 64 bits in length and all opera- 
tions are performed between 64-bit registers. It is not a 32-bit architecture that was later 
expanded to 64 bits. 


Alpha Is Designed for Very High-Speed Implementations 


The instructions are very simple. All instructions are 32 bits in length. Memory operations are 
either loads or stores. All data manipulation is done between registers. 


The Alpha architecture facilitates pipelining multiple instances of the same operations because 
there are no special registers and no condition codes. 


The instructions interact with each other only by one instruction writing a register or memory 
and another instruction reading from the same place. That makes it particularly easy to build 
- implementations that issue multiple instructions every CPU cycle. 
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Alpha makes it easy to maintain binary compatibility across multiple implementations and 
easy to maintain full speed on multiple-issue implementations. For example, there are no 
implementation-specific pipeline timing hazards, no load-delay slots, and no branch-delay 
slots. 


The Alpha Approach to Byte Manipulation 


The Alpha architecture reads and writes bytes between registers and memory with the LDBU 
and STB instructions. (Alpha also supports word read/writes with the LDWU and STW 
instructions.) 


Byte shifting and masking is performed with normal 64-bit register-to-register instructions, 
crafted to keep instruction sequences short. 


The Alpha Approach to Multiprocessor Shared Memory 


As viewed from a second processor (including an I/O device), a sequence of reads and writes 
issued by one processor may be arbitrarily reordered by an implementation. This allows imple- 
mentations to use multibank caches, bypassed write buffers, write merging, pipelined writes 
with retry on error, and so forth. If strict ordering between two accesses must be maintained, 
explicit memory barrier instructions can be inserted in the program. 


The basic multiprocessor interlocking primitive is a RISC-style load_locked, modify, 
store_conditional sequence. If the sequence runs without interrupt, exception, or an interfering 
write from another processor, then the conditional store succeeds. Otherwise, the store fails 
and the program eventually must branch back and retry the sequence. This style of interlock- 
ing scales well with very fast caches and makes Alpha an especially attractive architecture for 
building multiple-processor systems. 


Alpha Instructions Include Hints for Achieving Higher Speed 


A number of Alpha instructions include hints for implementations, all aimed at achieving 
higher speed. 


e Calculated jump instructions have a target hint that can allow much faster subroutine 
calls and returns. 


e There are prefetching hints for the memory system that can allow much higher cache hit 
rates. 


e There are granularity hints for the virtual-address mapping that can allow much more 
effective use of translation lookaside buffers for large contiguous structures. 


PALcode — Alpha’s Very Flexible Privileged Software Library 


A Privileged Architecture Library (PALcode) is a set of subroutines that are specific to a par- 
ticular Alpha operating system implementation. These subroutines provide operating-system 
primitives for context switching, interrupts, exceptions, and memory management. PALcode 
is similar to the BIOS libraries that are provided in personal computers. 


PALcode subroutines are invoked by implementation hardware or by software CALL_PAL 
instructions. . 
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PALcode is written in standard machine code with some implementation-specific extensions 
to provide access to low-level hardware. 


PALcode lets Alpha implementations run the full OpenVMS Alpha, DIGITAL UNIX, and 
Windows NT Alpha operating systems. PALcode can provide this functionality with little 
overhead. For example, the OpenVMS Alpha PALcode instructions let Alpha run OpenVMS 
with little more hardware than that found on a conventional RISC machine: the PAL mode bit 
itself, plus four extra protection bits in each translation buffer entry. 


Other versions of PALcode can be developed for real-time, teaching, and other applications. 


PALcode makes Alpha an especially attractive architecture for multiple operating systems. 


Alpha and Programming Languages 


Alpha is an attractive architecture for compiling a large variety of programming languages. 
Alpha has been carefully designed to avoid bias toward one or two programming languages. 
For example: 


e Alpha does not contain a subroutine call instruction that moves a register window by a 
fixed amount. Thus, Alpha is a good match for programming languages with many 
parameters and programming languages with no parameters. 


e Alpha does not contain a global integer overflow enable bit. Such a bit would need to 
be changed at every subroutine boundary when a FORTRAN program calls a C pro- 
gram. 


1.2 Data Format Overview 


Alpha is a load/store RISC architecture with the following data characteristics: 
¢ All operations are done between 64-bit registers. 


¢ Memory is accessed via 64-bit virtual byte addresses, using the little-endian or, option- 
ally, the big-endian byte numbering convention. 


e There are 32 integer registers and 32 floating-point registers. 
¢ Longword (32-bit) and quadword (64-bit) integers are supported. 
e = Five floating-point data types are supported: 

—- VAX F_floating (32-bit) 

— VAX G_floating (64-bit) 

— IEEE single (32-bit) 

— IEEE double (64-bit) 

_  IBEE extended (128-bit) 
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1.3 Instruction Format Overview 


As shown in Figure 1—1, Alpha instructions are all 32 bits in length. There are four major 
instruction format classes that contain 0, 1, 2, or 3 register fields. All formats have a 6-bit 
opcode. — 


Figure 1-1: Instruction Format Overview 


31 - 26 25 2120 1615 5 4 0 
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PALcode Format 


Branch Format 





e PALcode instructions specify, in the function code field, one of a few dozen complex 
operations to be performed. 


¢ Conditional branch instructions test register Ra and specify a signed 21-bit PC-rela- 
tive longword target displacement. Subroutine calls put the return address in register 
Ra. 


e Load and store instructions move bytes, words, longwords, or quadwords between 
register Ra and memory, using Rb plus a signed 16-bit displacement as the memory 
address. 


¢ Operate instructions for floating-point and integer operations are both represented in 
Figure 1—1 by the operate format illustration and are as follows: 


— Word and byte sign-extension operators. 


— Floating-point operations use Ra and Rb as source registers and write the result in 
register Rc. There is an 11-bit extended opcode in the function field. 


— Integer operations use Ra and Rb or an 8-bit literal as the source operand, and write 
the result in register Rc. | 


— Integer operate instructions can use the Rb field and part of the function field to 
specify an 8-bit literal. There is a 7-bit extended opcode in the function field. 


1.4 Instruction Overview 


PALcode Instructions 


As described in Section 1.1, a Privileged Architecture Library (PALcode) is a set of subrou- 
tines that is specific to a particular Alpha operating-system implementation. These subroutines 
can be invoked by hardware or by software CALL_PAL instructions, which use the function 
field to vector to the specified subroutine. 
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Branch Instructions 


Conditional branch instructions can test a register for positive/negative or for zero/nonzero, 
and they can test integer registers for even/odd. Unconditional branch instructions can write a 
return address into a register. 


There is also a calculated jump instruction that branches to an arbitrary 64-bit address in a 
register. 


Load/Store Instructions 


Load and store instructions move 8-bit, 16-bit, 32-bit, or 64-bit aligned quantities from and to 
memory. Memory addresses are flat 64-bit virtual addresses with no segmentation. 


The VAX floating-point load/store instructions swap words to give a consistent register format 
for floating-point operations. 


A 32-bit integer datum is placed in a register in a canonical form that makes 33 copies of the 
high bit of the datum. A 32-bit floating-point datum is placed in a register in a canonical form 
that extends the exponent by 3 bits and extends the fraction with 29 low-order zeros. The 32- 
bit operates preserve these canonical forms. 


Compilers, as directed by user declarations, can generate any mixture of 32-bit and 64-bit oper- 
ations. The Alpha architecture has no 32/64 mode bit. 


Integer Operate Instructions 


The integer operate instructions manipulate full 64-bit values and include the usual assortment 
of arithmetic, compare, logical, and shift instructions. 


There are just three 32-bit integer operates: add, subtract, and multiply. They differ from their 
64-bit counterparts only in overflow detection and in producing 32-bit canonical results. 


There is no integer divide instruction. 


The Alpha architecture also supports the following additional operations: 
e Scaled add/subtract instructions for quick subscript calculation 
¢ 128-bit multiply for division by a constant, and multiprecision arithmetic 
¢ Conditional move instructions for avoiding branch instructions 
e Anextensive set of in-register byte and word manipulation instructions 
¢ A set of multimedia instructions that support graphics and video 


Integer overflow trap enable is encoded in the function field of each instruction, rather than 
kept in a global state bit. Thus, for example, both ADDQ/V and ADDQ opcodes exist for spec- 
ifying 64-bit ADD with and without overflow checking. That makes it easier to pipeline 
implementations. 
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Floating-Point Operate Instructions 


The floating-point operate instructions include four complete sets of VAX and IEEE arith- 
metic instructions, plus instructions for performing conversions between floating-point and 
' integer quantities. 


In addition to the operations found in conventional RISC architectures, Alpha includes condi- 
tional move instructions for avoiding branches and merge sign/exponent instructions for 
simple field manipulation. 


The arithmetic trap enables and rounding mode are encoded in the function field of each 
instruction, rather than kept in global state bits. That makes it easier to pipeline 
implementations. 


1.5 Instruction Set Characteristics 


Alpha instruction set characteristics are as follows: 


All instructions are 32 bits long and have a regular format. 


There are 32 integer registers (RO through R31), each 64 bits wide. R31 reads as zero, 
and writes to R31 are ignored. 


All integer data manipulation is between integer registers, with up to two variable regis- 
ter source operands (one may be an 8-bit literal) and one register destination operand. 


There are 32 floating-point registers (FO through F31), each 64 bits wide. F31 reads as 
zero, and writes to F31 are ignored. 


All floating-point data manipulation is between floating-point registers, with up to two 
register source operands and one register destination operand. 


Instructions can move data in an integer register file to a floating-point register file, and 
data in a floating-point register file to an integer register file. The instructions do not 
interpret bits in the register files and do not access memory. 


All memory reference instructions are of the load/store type that moves data between 
registers and memory. 


There are no branch condition codes. Branch instructions test an integer or floating- 
point register value, which may be the result of a previous compare. 


Integer and logical instructions operate on quadwords. 


Floating-point instructions operate on G_floating, F_floating, and IEEE extended, dou- 
ble, and single operands. D_floating "format compatibility,” in which binary files of 
D_floating numbers may be processed, but without the last 3 bits of fraction precision, 
is also provided. 


A minimal number of VAX compatibility instructions are included. 


1.6 Terminology and Conventions 


The following sections describe the terminology and conventions used in this book. 
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1.6.1 Numbering 


All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other 
than decimal are indicated with the name of the base in subscript form, for example, 1016. 


1.6.2 Security Holes 


A security hole is an error of commission, omission, or oversight in a system that allows pro- 
tection mechanisms to be bypassed. 


Security holes exist when unprivileged software (software running outside of kernel mode) 
can: 


e §6©Affect the operation of another process without authorization from the operating sys- 
tem; 


¢ Amplify its privilege without authorization from the operating system; or 


¢ Communicate with another process, either overtly or covertly, without authorization 
from the operating system. 


The Alpha architecture has been designed to contain no architectural security holes. Hardware 
(processors, buses, controllers, and so on) and software should likewise be designed to avoid 
security holes. 


1.6.3 UNPREDICTABLE and UNDEFINED 


The terms UNPREDICTABLE and UNDEFINED are used throughout this book. Their mean- 
ings are quite different and must be carefully distinguished. 


In particular, only privileged software (software running in kernel mode) can trigger UNDE- 
FINED operations. Unprivileged software cannot trigger UNDEFINED operations. However, 
either privileged or unprivileged software can trigger UNPREDICTABLE results or 
occurrences. 


UNPREDICTABLE results or occurrences do not disrupt the basic operation of the processor; 
it continues to execute instructions in its normal manner. In contrast, UNDEFINED operation 
can halt the processor or cause it to lose information. 


The terms UNPREDICTABLE and UNDEFINED can be further described as follows: 


UNPREDICTABLE 


¢ Results or occurrences specified as UNPREDICTABLE may vary from moment to 
moment, implementation to implementation, and instruction to instruction within 
implementations. Software can never depend on results specified as UNPREDICT- 
ABLE. 


¢ An UNPREDICTABLE result may acquire an arbitrary value subject to a few con- 
straints. Such a result may be an arbitrary function of the input operands or of any state 
information that is accessible to the process in its current access mode. UNPREDICT- 
ABLE results may be unchanged from their previous values. 
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Operations that produce UNPREDICTABLE results may also produce exceptions. 


e An occurrence specified as UNPREDICTABLE may happen or not based on an arbi- 
trary choice function. The choice function is subject to the same constraints as are 
UNPREDICTABLE results and, in particular, must not constitute a security hole. 


Specifically, UNPREDICTABLE results must not depend upon, or be a function of, 
the contents of memory locations or registers that are inaccessible to the current 
process in the current access mode. 


Also, operations that may produce UNPREDICTABLE results must not: 


— Write or modify the contents of memory locations or registers to which the current 
process in the current access mode does not have access, or 


— Halt or hang the system or any of its components. 


For example, a security hole would exist if some UNPREDICTABLE result depended 
on the value of a register in another process, on the contents of processor temporary 
registers left behind by some previously running process, or on a sequence of actions 
of different processes. 


UNDEFINED 


¢ Operations specified as UNDEFINED may vary from moment to moment, implementa- 
tion to implementation, and instruction to instruction within implementations. The 
operation may vary in effect from nothing to stopping system operation. 


e UNDEFINED operations may halt the processor or cause it to lose information. How- 
ever, UNDEFINED operations must not cause the processor to hang, that is, reach an 
unhalted state from which there is no transition to a normal state in which the machine 
executes instructions. 


1.6.4 Ranges and Extents 


Ranges are specified by a pair of numbers separated by two periods and are inclusive. For 
example, a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4. 


Extents are specified by a pair of numbers in angle brackets separated by a colon and are inclu- 
sive. For example, bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3. 


1.6.5 ALIGNED and UNALIGNED 


In this document the terms ALIGNED and NATURALLY ALIGNED are used interchange- 
ably to refer to data objects that are powers of two in size. An aligned datum of size 2**N is 
stored in memory at a byte address that is a multiple of 2**N, that is, one that has N low-order 
zeros. Thus, an aligned 64-byte stack frame has a memory address that is a multiple of 64. 


If a datum of size 2**N is stored at a byte address that is not a multiple of 2**N, it is called 
UNALIGNED. — 
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1.6.6 Must Be Zero (MBZ) 


Fields specified as Must be Zero (MBZ) must never be filled by software with a non-zero 
value. These fields may be used at some future time. If the processor encounters a non-zero 
value in a field specified as MBZ, an Illegal Operand exception occurs. 


1.6.7 Read As Zero (RAZ) 


Fields specified as Read as Zero (RAZ) return a zero when read. 


1.6.8 Should Be Zero (SBZ) 
Fields specified as Should be Zero (SBZ) should be filled by software with a zero value. Non- 


zero values in SBZ fields produce UNPREDICTABLE results and may produce extraneous 
instruction-issue delays. 


1.6.9 Ignore (IGN) 


Fields specified as Ignore (IGN) are ignored when written. 


1.6.10 Implementation Dependent (IMP) 
Fields specified as Implementation Dependent (IMP) may be used for implementation-specific 


purposes. Each implementation must document fully the behavior of all fields marked as IMP 
by the Alpha specification. 


1.6.11 Illustration Conventions 


Illustrations that depict registers or memory follow the convention that increasing addresses 
run right to left and top to bottom. 


1.6.12 \Conditionalized (Backslash) Text Convention 
At certain points in the manual, comments on why certain decisions were made, unresolved 


issues, etc., are included between a pair of backslashes. These comments provide additional 
clarification and are removed from externally distributed editions.\ 


1.6.13 Macro Code Example Conventions 


All instructions in macro code examples are either listed in Common Architecture (I), Chapter 
4 or OpenVMS Software II-A, Chapter 2, or are stylized code forms found in Appendix A. 
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1.7 \Revision History 


Revision 7.0, November 10, 1997 
1. Added ECO 81, byte and word 
2. Alpha AXP —> Alpha 


3. Because of EV6, remove imprecise arithmetic trap text 


Revision 6.0, December, 1994 
1. Added Windows NT Alpha 
2. Added ECO 82 (X_float) 
3. Alpha —> Alpha AXP 
4. Fixed Ra ——> Rb in Section 1.3 
5. Added eco 76 (bi-endian) 


Revision 5.0, May 12, 1992 
1. VMS ——> OpenVMS 
2. Converted to SDML 
3. Removed reference to EVAX 


Revision 4.0, March 29, 1991 
1. Typos 
2. Correct security holes text 
3. Upgrade UNPREDICTABLE definition 
4. Add Implementation Dependent definition 


Revision 3.0, March 2, 1990 
1. Strengthen UNPREDICTABLE definition 
2. Add UNALIGNED definition 
3. Add Security Hole definition 


Revision 2.0, October 4, 1989 
1. Change the read as zero, write ignored registers to R31 and F31 


2. Update instruction Set Characteristics for new insert and merge byte instructions - 


Revision 1.0, May 23, 1989 
1. Change MBZ and SBZ definitions 


Revision 0.0, March 15, 1988 


1. Initial version\ 
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Chapter 2 


Basic Architecture (I) 


2.1 Addressing 


The basic addressable unit in the Alpha architecture is the 8-bit byte. Virtual addresses are 64 
bits long. An implementation may support a smaller virtual address space. The minimum vir- 
tual address size is 43 bits. 


Virtual addresses as seen by the program are translated into physical memory addresses by the 
memory management mechanism. 


Although the data types in Section 2.2 are described in terms of little-endian byte addressing, 


implementations may also include big-endian addressing support, as described in Section 2.3. 
All current implementations have some big-endian support. 


2.2 Data Types 


Following are descriptions of the Alpha architecture data types. 


2.2.1 Byte 


A byte is 8 contiguous bits starting on an addressable byte boundary. The bits are numbered 
from right to left, 0 through 7, as shown in Figure 2-1. 


Figure 2-1: Byte Format 


7 0 


A byte is specified by its address A. A byte is an 8-bit value. The byte is only supported in 
Alpha by the load, store, sign-extend, extract, mask, insert, and zap instructions. 


2.2.2 Word 


A word is 2 contiguous bytes starting on an arbitrary byte boundary. The bits are numbered 
from right to left, 0 through 15, as shown in Figure 2-2. 
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Figure 2-2: Word Format 


15 0 


A word is specified by its address, the address of the byte containing bit 0. 


A word is a 16-bit value. The word is only supported in Alpha by the load, store, sign-extend, 
extract, mask, and insert instructions. 


2.2.3 Longword 


A longword is 4 contiguous bytes starting on an arbitrary byte boundary. The bits are num- 
bered from right to left, 0 through 31, as shown in Figure 2-3. 


Figure 2-3: Longword Format 


31 0 


A longword is specified by its address A, the address of the byte containing bit 0. A longword 
is a 32-bit value. 


When interpreted arithmetically, a longword is a two’s-complement integer with bits of 
increasing significance from 0 through 30. Bit 31 is the sign bit. The longword is only sup- 
ported in Alpha by sign-extended load and store instructions and by longword arithmetic 
instructions. . | 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
longword operands that are not naturally aligned. (A naturally aligned longword has zero 
as the low-order two bits of its address.) 


2.2.4 Quadword 


A quadword is 8 contiguous bytes starting on an arbitrary byte boundary. The bits are num- 
bered from right to left, 0 through 63, as shown in Figure 2-4. 


Figure 2-4: Quadword Format 


63 0 
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A quadword is specified by its address A, the address of the byte containing bit 0. A quadword 
is a 64-bit value. When interpreted arithmetically, a quadword is either a two’s-complement 
integer with bits of increasing significance from 0 through 62 and bit 63 as the sign bit, or an 
unsigned integer with bits of increasing significance from 0 through 63. 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
quadword operands that are not naturally aligned. (A naturally aligned quadword has zero 
as the low-order three bits of its address.) 


2.2.5 VAX Floating-Point Formats 


VAX floating-point numbers are stored in one set of formats in memory and in a second set of 
formats in registers. The floating-point load and store instructions convert between these for- 
mats purely by rearranging bits; no rounding or range-checking is done by the load and store 
instructions. 


2.2.5.1 F_floating 


An F_floating datum is 4 contiguous bytes in memory starting on an arbitrary byte boundary. 
The bits are labeled from right to left, 0 through 31, as shown in Figure 2-5. . 


Figure 2-5: F_floating Datum 


31 16 15 14 7 6 0 


An F floating operand occupies 64 bits in a floating register, left-justified in the 64-bit regis- 
ter, as Shown in Figure 2-6. 


Figure 2-6: F_floating Register Format 


63 62 52 51 29 28 0 


The F_floating load instruction reorders bits on the way in from memory, expands the expo- 
nent from 8 to 11 bits, and sets the low-order fraction bits to zero. This produces in the register 
an equivalent G_floating number suitable for either F_floating or G_floating operations. The 
mapping from 8-bit memory-format exponents to 11-bit register-format exponents is shown in 
Table 2-1. This mapping preserves both normal values and exceptional values. 
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Table 2-1: F_floating Load Exponent Mapping (MAP_F) 


Memory <14:7> Register <62:52> 

11111111 1 000 1111111 

1 XXxXxXxxx 1 OOO xxxxxxx  (XXXxxxx not all 1’s) 
O XXXXXxx O 111 xxxxxxx = (KXXXXXX not all 0’s) 
0 0000000 ~ 0.000 0000000 


The F_floating store instruction reorders register bits on the way to memory and does no 
checking of the low-order fraction bits. Register bits <61:59> and <28:0> are ignored by the 
store instruction. 


An F_floating datum is specified by its address A, the address of the byte containing bit 0. The 
memory form of an F_floating datum is sign magnitude with bit 15 the sign bit, bits <14:7> an 
excess-128 binary exponent, and bits <6:0> and <31:16> a normalized 24-bit fraction with the 
redundant most significant fraction bit not represented. Within the fraction, bits of increasing 
significance are from 16 through 31 and 0 through 6. The 8-bit exponent field encodes the val- | 
ues 0 through 255. An exponent value of 0, together with a sign bit of 0, is taken to indicate 
that the F_floating datum has a value of 0. 


If the result of a VAX floating-point format instruction has a value of zero, the instruction 
always produces a datum with a sign bit of 0, an exponent of 0, and all fraction bits of 0. Expo- 
nent values of 1..255 indicate true binary exponents of -127..127. An exponent value of 0, 
together with a sign bit of 1, is taken as a reserved operand. Floating-point instructions pro- 
cessing a reserved operand take an arithmetic exception. The value of an F_floating datum is 
in the approximate range 0.29*10**—38 through 1.7*10**38. The precision of an F_floating 
datum is approximately one part in 2**23, typically 7 decimal digits. See Section 4.7. 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
F_floating operands that are not naturally aligned. (A naturally aligned F_floating datum 
has zero as the low-order two bits of its address.) 


2.2.5.2 G_floating 


A G_floating datum in memory is 8 contiguous bytes starting on an arbitrary byte boundary. 
The bits are labeled from right to left, 0 through 63, as shown in Figure 2-7. 


Figure 2-7: G_floating Datum 


31 1615 14 4 3 0 
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A G_floating operand occupies 64 bits in a floating register, arranged as shown in Figure 2-8. 


Figure 2-8: G_floating Register Format 


63 62 52 51 32 31 ; 0 


A G_floating datum is specified by its address A, the address of the byte containing bit 0. The 
form of a G_floating datum is sign magnitude with bit 15 the sign bit, bits <14:4> an excess- 
1024 binary exponent, and bits <3:0> and <63:16> a normalized 53-bit fraction with the redun- 
dant most significant fraction bit not represented. Within the fraction, bits of increasing 
significance are from 48 through 63, 32 through 47, 16 through 31, and 0 through 3. The 11- 
bit exponent field encodes the values 0 through 2047. An exponent value of 0, together with a 
sign bit of 0, is taken to indicate that the G_floating datum has a value of 0. 


If the result of a floating-point instruction has a value of zero, the instruction always produces 
a datum with a sign bit of 0, an exponent of 0, and all fraction bits of 0. Exponent values of 
1..2047 indicate true binary exponents of —1023..1023. An exponent value of 0, together with 
a sign bit of 1, is taken as a reserved operand. Floating-point instructions processing a reserved 
operand take a user-visible arithmetic exception. The value of a G_floating datum is in the 
approximate range 0.56*1 0**—308 through 0.9*10**308. The precision of a G_floating 
datum is approximately one part in 2**52, typically 15 decimal digits. See Section 4.7. 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
G_floating operands that are not naturally aligned. (A naturally aligned G_floating datum 
has zero as the low-order three bits of its address.) 


2.2.5.3 D_floating 


A D_floating datum in memory is 8 contiguous bytes starting on an arbitrary byte boundary. 
The bits are labeled from right to left, 0 through 63, as shown in Figure 2-9. 


Figure 2-9: D_floating Datum 


31 16 15 14 7 6 0 
Fraction Midh Exp. ‘A 
Fraction Lo Fraction Mid! ‘A+4 


A D_floating operand occupies 64 bits in a floating register, arranged as shown in Figure 2-10. 


Figure 2-10: D_floating Register Format 


63 62 55 54 48 47 32 31 1615 0 
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The reordering of bits required for a D_floating load or store is identical to that required for a 
G_floating load or store. The G_floating load and store instructions are therefore used for load- 
ing or storing D_floating data. 


A D_floating datum is specified by its address A, the address of the byte containing bit 0. The 
memory form of a D_floating datum is identical to an F_floating datum except for 32 addi- 
tional low significance fraction bits. Within the fraction, bits of increasing significance are 
from 48 through 63, 32 through 47, 16 through 31, and 0 through 6. The exponent conventions 
and approximate range of values is the same for D_floating as F_floating. The precision of a 
D_floating datum is approximately one part in 2**55, typically 16 decimal digits. 


Notes: 


D_floating is not a fully supported data type; no D_floating arithmetic operations are 
provided in the architecture. For backward compatibility, exact D_floating arithmetic may 
be provided via software emulation. D_floating "format compatibility"in which binary 
files of D_floating numbers may be processed, but without the last three bits of fraction 
precision, can be obtained via conversions to G_floating, G arithmetic operations, then 
conversion back to D_floating. 


Alpha implementations will impose a significant performance penalty on access to 
D_floating operands that are not naturally aligned. (A naturally aligned D_floating datum 
has zero as the low-order three bits of its address.) 


2.2.6 IEEE Floating-Point Formats 


The IEEE standard for binary floating-point arithmetic, ANSI/IEEE 754-1985, defines four 
floating-point formats in two groups, basic and extended, each having two widths, single and 
double. The Alpha architecture supports the basic single and double formats, with the basic 
double format serving as the extended single format. The values representable within a format 
are specified by using three integer parameters: 


e P-—the number of fraction bits 

e Emax — the maximum exponent 

e Emin -— the minimum exponent 

Within each format, only the following entities are permitted: 

¢ Numbers of the form (—1)**S x 2**E x b(O).b(1)b(2)..b(P—1) where: 
- S=0Oorl 
— E=any integer between Emin and Emax, inclusive 
— b(n)=0orl | 

e Two infinities — positive and negative 

e At least one Signaling NaN 

e §=6At least one Quiet NaN 


NaN is an acronym for Not-a-Number. A NaN is an IEEE floating-point bit pattern that repre- 
sents something other than a number. NaNs come in two forms: Signaling NaNs and Quiet 
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NaNs. Signaling NaNs are used to provide values for uninitialized variables and for arithmetic 
enhancements. Quiet NaNs provide retrospective diagnostic information regarding previous 
invalid or unavailable data and results. Signaling NaNs signal an invalid operation when they 
are an operand to an arithmetic instruction, and may generate an arithmetic exception. Quiet 
NaNs propagate through almost every operation without generating an arithmetic exception. 


Arithmetic with the infinities is handled as if the operands were of arbitrarily large magnitude. 
Negative infinity is less than every finite number; positive infinity is greater than every finite 
number. 


2.2.6.1 S_Floating 


An IEEE single-precision, or S_floating, datum occupies 4 contiguous bytes in memory start- 
ing on an arbitrary byte boundary. The bits are labeled from right to left, 0 through 31, as 
shown in Figure 2—11. 


Figure 2-11: S_floating Datum 


31 30 23 22 0 


An S_floating operand occupies 64 bits in a floating register, left-justified in the 64-bit regis- 
ter, as shown in Figure 2-12. 


Figure 2-12: S_floating Register Format 


63 62 52 51 29 28 0 


The S_floating load instruction reorders bits on the way in from memory, expanding the expo- 
nent from 8 to 11 bits, and sets the low-order fraction bits to zero. This produces in the register 
an equivalent T_floating number, suitable for either S_floating or T_floating operations. The 
mapping from 8-bit memory-format exponents to 11-bit register-format exponents is shown in 
Table 2-2. 


Table 2-2: S_floating Load Exponent Mapping (MAP_S) 


Memory <30:23> Register <62:52> 


11111111 11111111111 
1 XXXXXXX 1 OOO xxxxxxx (XXXXxxx not all 1’s) 
Q XXXXXxXxX O 111 xxxxxxx (Xxxxxxx not all 0’s) 
0 0000000 0 000 0000000 
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This mapping preserves both normal values and exceptional values. Note that the mapping for 
all 1’s differs from that of F_floating load, since for S_floating all 1’s is an exceptional value 
and for F_floating all 1’s is a normal value. 


The S_floating store instruction reorders register bits on the way to memory and does no 
checking of the low-order fraction bits. Register bits <61:59> and <28:0> are ignored by the 
store instruction. The S_floating load instruction does no checking of the input. 


The S_floating store instruction does no checking of the data; the preceding operation should 
have specified an S_floating result. 


An S_floating datum is specified by its address A, the address of the byte containing bit 0. The | 
memory form of an S_floating datum is sign magnitude with bit 31 the sign bit, bits <30:23> 
an excess-127 binary exponent, and bits <22:0> a 23-bit fraction. 


The value (V) of an S_ floating number is inferred from its constituent sign (S), exponent (EB), 
and fraction (F) fields as follows: 

e = If E=255 and F<>0, then V is NaN, regardless of S. 

e = If E=255 and F=0, then V = (-1)**S x Infinity. 

e If0<E<255, then V = (-1)**S x 2**(E-127) x (1.P). 

e If E=0 and F<>0, then V = (-1)**S x 2**(-126) x (0.F). 

e =6If E=0 and F=0, then V = (-1)**S x 0 (zero). 


Floating-point operations on S_floating numbers may take an arithmetic exception for a vari- 
ety of reasons, including invalid operations, overflow, underflow, division by zero, and 
inexact results. 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
S_floating operands that are not naturally aligned. (A naturally aligned S_floating datum 
has zero as the low-order two bits of its address.) 


2.2.6.2 T_floating 


An IEEE double-precision, or T_floating, datum occupies 8 contiguous bytes in memory start- 
ing on an arbitrary byte boundary. The bits are labeled from right to left, 0 through 63, as 
shown in Figure 2-13. 


Figure 2-13: T_floating Datum 


31 30 20 19 0 
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A T_floating operand occupies 64 bits in a floating register, arranged as shown in Figure 2-14. 


Figure 2-14: T_floating Register Format 


63 62 52 51 32 31 0 


The T_floating load instruction performs no bit reordering on input, nor does it perform check- 
ing of the input data. 


The T_floating store instruction performs no bit reordering on output. This instruction does no 
checking of the data; the preceding operation should have specified a T_floating result. 


A T_floating datum is specified by its address A, the address of the byte containing bit 0. The 
form of a T_floating datum is sign magnitude with bit 63 the sign bit, bits <62:52> an excess- 
1023 binary exponent, and bits <51:0> a 52-bit fraction. 


The value (V) of a T_floating number is inferred from its constituent sign (S), exponent (E), 
and fraction (F) fields as follows: 

e = If E=2047 and F<>0, then V is NaN, regardless of S. 

e =f E=2047 and F=0, then V = (-1)**S x Infinity. 

e If0<E< 2047, then V =(-1)**S x 2**(E-1023) x (1.F). 

e = =6If E=0 and F<>0, then V = (—1)**S x 2**(—1022) x (0.F). 

e If E=0 and F=0, then V = (-1)**S x 0 (zero). 


Floating-point operations on T_floating numbers may take an arithmetic exception for a vari- 
ety of reasons, including invalid operations, overflow, underflow, division by zero, and 
inexact results. 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
-T_floating operands that are not naturally aligned. (A naturally aligned T_floating datum 
has zero as the low-order three bits of its address.) 


2.2.6.3 X_Floating 


Support for 128-bit IEEE extended-precision (X_float) floating-point is initially provided 
entirely through software. This section is included to preserve the intended consistency of 
implementation with other IEEE floating-point data types, should the X_float data type be sup- 
ported in future hardware. 


An IEEE extended-precision, or X_floating, datum occupies 16 contiguous bytes in memory, 
starting on an arbitrary byte boundary. The bits are labeled from right to left, 0 through 127, as 
shown in Figure 2-15. 
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Figure 2-15: X_floating Datum 


63 62 48 47 0 
Fraction_low ‘A 


An X_floating datum occupies two consecutive even/odd floating-point registers (such as 
_ F4/F5), as shown in Figure 2-16. | 





Figure 2-16: X_floating Register Format 


127 126 112 111 64 63 


0 
Fraction_high Fraction_low | 


Fn OR 1 Fn 
An X_floating datum is specified by its address A, the address of the byte containing bit 0. 
The form of an X_floating datum is sign magnitude with bit 127 the sign bit, bits <126:112> 
an excess—16383 binary exponent, and bits <111:0> a 112-bit fraction. 








The value (V) of an X_floating number is inferred from its constituent sign (S), exponent (E), 
and fraction (F) fields as follows: 


e If E=32767 and F<>0, then V is a NaN, regardless of S. 

e =If E=32767 and F=0, then V = (-1)**S x Infinity. 

¢ If0<E <32767, then V = (—1)**S x 2**(E-16383) x (1.F). 
° If E=0 and F<> 0, then V = (-1)**S x 2**(-16382) x (0.F). 
e If E=0 and F=0, then V = (-1)**S x 0 (zero). 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
X_floating operands that are not naturally aligned. (A naturally aligned X_floating datum 
has zero as the low-order four bits of its address.) 


X_Floating Big-Endian Formats 


Section 2.3 describes Alpha support for big-endian data types. It is intended that software or 
hardware implementation for a big-endian X_float data type comply with that support and 
have the following formats. 
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A+8: 


2.2.7 





Figure 2-17: X_floating Big-Endian Datum 


Byte 


Palo 


Byte 
15 





Figure 2-18: X_floating Big-Endian Register Format. 


Byte Byte 


15 
Fraction_high Fraction_low 


Fn OR 1 Fn 


0 


Longword Integer Format in Floating-Point Unit 








A longword integer operand occupies 32 bits in memory, arranged as shown in Figure 2-19. 


Figure 2-19: Longword Integer Datum 


31 30 0 
Integer ‘A 





~ A longword integer operand occupies 64 bits in a floating register, arranged as shown in Fig- 


ure 2—20. 


Figure 2-20: Longword Integer Floating-Register Format 


63 62 61 59 58 29 28 0 


There is no explicit longword load or store instruction; the S_floating load/store instructions 
are used to move longword data into or out of the floating registers. The register bits <61:59> 
are set by the S_floating load exponent mapping. They are ignored by S_floating store. They 
are also ignored in operands of a longword integer operate instruction, and they are set to 000 
in the result of a longword operate instruction. 


The register format bit <62> "I" in Figure 2—20 is part of the Integer field in Figure 2-19 and 
represents the high-order bit of that field. 
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Note: 


Alpha implementations will impose a significant performance penalty when accessing 
longwords that are not naturally aligned. (A naturally aligned longword datum has zero as 
the low-order two bits of its address.) 


2.2.8 Quadword Integer Format in Floating-Point Unit 


A quadword integer operand occupies 64 bits in memory, arranged as shown in Figure 2-21. 


Figure 2-21: Quadword Integer Datum 


31 30 0 


A quadword integer operand occupies 64 bits in a floating register, arranged as shown in Fig- 
ure 2-22. 


Figure 2-22: Quadword Integer Floating-Register Format 


63 62 32 31 0 


There is no explicit quadword load or store instruction; the T_floating load/store instructions 
are used to move quadword data between memory and the floating registers. (The ITOFT and 
FTOIT are used to move quadword data between integer and floating registers.) 


The T_floating load instruction performs no bit reordering on input. The T_floating store 


nate wavtawiemno nn sans 


instruction performs no bit reordering on output. This instruction does no checking of the data; 
when used to store quadwords, the preceding operation should have specified a quadword 
result. 


Note: 


Alpha implementations will impose a significant performance penalty when accessing 
quadwords that are not naturally aligned. (A naturally aligned quadword datum has zero as 
the low-order three bits of its address.) 


2.2.9 Data Types with No Hardware Support 
The following VAX data types are not directly supported in Alpha hardware. \See the DEC 
STD 032: VAX Architecture Standard for detailed information on these data types.\ 
¢ Octaword 
¢ = H_floating 
¢ D_floating (except load/store and convert to/from G_floating) — 
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¢ =Variable-Length Bit Field 

e Character String 

e = Trailing Numeric String 

e Leading Separate Numeric String 


¢ ~=Packed Decimal String 


2.3 Big-Endian Addressing Support 
Alpha implementations may include optional big-endian addressing support. 
In a little-endian machine, the bytes within a quadword are numbered right to left: 


Figure 2—23: Little-Endian Byte Addressing 





In a big-endian machine, they are numbered left to right: 


Figure 2-24: Big-Endian Byte Addressing 





Bit numbering within bytes is not affected by the byte numbering convention (big-endian or lit- 
tle-endian). 


The format for the X_floating big-endian data type is shown in Section 2.2.6.3. 


The byte numbering convention does not matter when accessing complete aligned quadwords 
in memory. However, the numbering convention does matter when accessing smaller or 
unaligned quantities, or when manipulating data in registers, as follows: 


e A quadword load or store of data at location 0 moves the same eight bytes under both 
numbering conventions. However, a longword load or store of data at location 4 must 
move the leftmost half of a quadword under the little-endian convention, and the right- 
most half under the big-endian convention. Thus, to support both conventions, the con- 
vention being used must be known and it must affect longword load/store operations. 


e =6A byte extract of byte 5 from a quadword of data into the low byte of a register requires 
a right shift of 5 bytes under the little-endian convention, but a right shift of 2 bytes 
under the big-endian convention. 
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e Manipulation of data in a register is almost the same for both conventions. In both, inte- 
ger and floating-point data have their sign bits in the leftmost byte and their least signif- 
icant bit in the rightmost byte, so the same integer and floating-point instructions are 
used unchanged for both conventions. Big-endian character strings have their most sig- 
nificant character on the left, while little-endian strings have their most significant char- 
acter on the right. 


e The compare byte (CMPBGE) instruction is neutral about direction, doing eight byte 
compares in parallel. However, following the CMPBGE instruction, the code is differ- 
ent that examines the byte mask to determine which string is larger, depending on 
whether the rightmost or leftmost unequal byte is used. Thus, compilers must be 
instructed to generate somewhat different code sequences for the two conventions. 


Implementations that include big-endian support must supply all of the following features: 


e A means at boot time to choose the byte numbering convention. The implementation is 
not required to support dynamically changing the convention during program execu- 
tion. The chosen convention applies to all code executed, both operating-system and 
user. 


e If the big-endian convention is chosen, the longword-length load/store instructions 
(LDF, LDL, LDL_L, LDS, STF, STL, STL_C, STS) invert bit va<2> (bit 2 of the vir- 
tual address). This has the effect of accessing the half of a quadword other than the half 
that would be accessed under the little-endian convention. 


e If the big-endian convention is chosen, the word-length load instruction, LDWU, 
inverts bits va<1:2> (bits 1 and 2 of the virtual address). This has the effect of accessing 
the half of the longword that would be accessed under the little-endian convention. 


e If the big-endian convention is chosen, the byte-length load instruction, LDBU, inverts 
bits va<0:2> (bits 0 through 2 of the virtual address). This has the effect of accessing 
the half of the word that would be accessed under the little-endian convention. 


e If the big-endian convention is chosen, the byte manipulation instructions (EXTxx, 
INSxx, MSKxx) invert bits Rbv<2:0>. This has the effect of changing a shift of 5 bytes 


1 ‘ ete Ana Rae eal 
into a shift of 2 bytes, for example. 


The instruction stream is always considered to be little-endian, and is independent of the cho- 
sen byte numbering convention. Compilers, linkers, and debuggers must be aware of this 
when accessing an instruction stream using data-stream load/store instructions. Thus, the right- 
most instruction in a quadword is always executed first and always has the instruction-stream 
address 0 MOD 8. The same bytes accessed by a longword load/store instruction have data- 
stream address 0 MOD 8 under the little-endian convention, and 4 MOD 8 under the big- 
endian convention. 


Using either byte numbering convention, it is sometimes necessary to access data that origi- 
nated on a machine that used the other convention. When this occurs, it is often necessary to 
swap the bytes within a datum. See Appendix A, Byte Swap, for a suggested code sequence. 
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\Note: 


_ The following Alpha implementations have only partial big-endian support; they have the 
first two features, but they do not invert bits Rbv<2:0> in the byte manipulation 
instructions: DECchip 21064, 21064A, 21066, 21068. These non-conforming 
implementations are grandfathered. 


The design above largely uses data-path logic that must already exist in a little-endian 
implementation and avoids adding instruction-path logic that would not normally exist. In 
particular, a little-endian implementation of longword-length load/store must be able to 
choose either half of a quadword; big-endian support simply alters the choice function. On 
the other hand, a little-endian implementation would not naturally have logic to invert the 
execution order of instructions within a quadword, so none is added for big-endian 
support.\ 


DIGITAL Restricted Distribution 


Basic Architecture (I) 2-15 


2.4 \Revision History 


Revision 7.0, November 10, 1997 
1. Added ECO 81 
2. Alpha AXP —> Alpha 


Revision 6.0, December, 1994 

Added ECO 82, IEEE X_float 

Added ECO 76, Bi-endian support 

Added MAP_x operator to F_float and S_float 

Alpha —-> Alpha AXP 

Changed fig 2-16 to agree with updated fig 2-15 (and text) 
Changed fig 2-18 to agree with updated fig 2-17 


Bre i ca es 


Update the datum illustrations to show 32 bits 


Revision 5.0, May 12, 1992 
1. Converted to SDML 


Revision 4.0, March 29, 1991 
D_floating point support removed Typos 


Word definition made homologous to longword, quadword 


1 
Z 
3. Specify no checking on S_floating load, and T_floating load 
4. Removed S_floating Format illustration and text 

5 


Clarified what is meant by a Vax floating point instruction 


Revision 3.0, March 2, 1990 


1. Cosmetic change to floating-point pictures 


Revision 2.0, October 4, 1989 
1. Nochange 


Revision 1.0, May 23, 1989 
1. Change minimum virtual address size to 40 bits 
2. Change Floating-point register format 


3. Remove alignment warning on word data type 


Revision 0.0, March 15, 1989 


1. Initial version\ 
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Chapter 3 


Instruction Formats (1) 


3.1 Alpha Registers 


Each Alpha processor has a set of registers that hold the current processor state. If an Alpha 
system contains multiple Alpha processors, there are multiple per-processor sets of these 
registers. 


3.1.1 Program Counter 


The Program Counter (PC) is a special register that addresses the instruction stream. As each 
instruction is decoded, the PC is advanced to the next sequential instruction. This is referred to 
as the updated PC. Any instruction that uses the value of the PC will use the updated PC. The 
PC includes only bits <63:2> with bits <1:0> treated as RAZ/IGN. This quantity is a long- 
word-aligned byte address. The PC is an implied operand on conditional branch and 
subroutine jump instructions. The PC is not accessible as an integer register. 


3.1.2 Integer Registers 


There are 32 integer registers (RO through R31), each 64 bits wide. 


Register R31 is assigned special meaning by the Alpha architecture. When R31 is specified as 
a register source operand, a zero-valued operand is supplied. 


For all cases except the Unconditional Branch and Jump instructions, results of an instruction 
that specifies R31 as a destination operand are discarded. Also, it is UNPREDICTABLE 
whether the other destination operands (implicit and explicit) are changed by the instruction. It 
is implementation dependent to what extent the instruction is actually executed once it has 
been fetched. An exception is never signaled for a load that specifies R31 as a destination oper- 
ation. For all other operations, it is UNPREDICTABLE whether exceptions are signaled 
during the execution of such an instruction. Note, however, that exceptions associated with the 
instruction fetch of such an instruction are always signaled. 


Implementation note: 


As described in Appendix A, certain load instructions to an R31 destination are the 
preferred method for performing a cache block prefetch. 
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There are some interesting cases involving R31 as a destination: 
e =6STx_C R31,disp(Rb) 


Although this might seem like a good way to zero out a shared location and reset the 
lock_flag, this instruction causes the lock_flag and virtual location {Rbv + 
SEXT(disp)} to become UNPREDICTABLE. 


e =6LDx_L R31,disp(Rb) 


This instruction produces no useful result since it causes both lock_flag and 
locked_physical_address to become UNPREDICTABLE. 


Unconditional Branch (BR and BSR) and Jump (JMP, JSR, RET, and JSR_COROUTINE) 
instructions, when R31 is specified as the Ra operand, execute normally and update the PC 
with the target virtual address. Of course, no PC value can be saved in R31. 


3.1.3 Floating-Point Registers 


There are 32 floating-point registers (FO through F31), each 64 bits wide. 


When F31 is specified as a register source operand, a true zero-valued operand is supplied. 
See Section 4.7.3 for a definition of true zero. 


Results of an instruction that specifies F31 as a destination operand are discarded and it is 
UNPREDICTABLE whether the other destination operands (implicit and explicit) are 
changed by the instruction. In this case, it is implementation-dependent to what extent the 
instruction is actually executed once it has been fetched. An exception is never signaled for a 
load that specifies F31 as a destination operation. For all other operations, it is UNPREDICT- 
ABLE whether exceptions are signaled during the execution of such an instruction. Note, 
however, that exceptions associated with the instruction fetch of such an instruction are 
always signaled. 


Implementation note: 


As described in Appendix A, certain load instructions to an F31 destination are the 
- preferred method for signalling a cache block prefetch. 


A floating-point instruction that operates on single-precision data reads all bits <63:0> of the 
source floating-point register. A floating-point instruction that produces a single-precision 
result writes all bits <63:0> of the destination floating-point register. 


3.1.4 Lock Registers 


There are two per-processor registers associated with the LDx_L and STx_C instructions, the 
lock_flag and the locked_physical_address register. The use of these registers is described in 
Section 4.2. 
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3.1.5 Processor Cycle Counter (PCC) Register 


The PCC register consists of two 32-bit fields. The low-order 32 bits (PCC<31:0>) are an 
unsigned wrapping counter, PCC_CNT. The high-order 32 bits (PCC<63:32>), PCC_OFF, are 
operating system dependent in their implementation. 


PCC_CNT is the base clock register.for measuring time intervals and is suitable for timing 
intervals on the order of nanoseconds. 


PCC_CNT increments once per N CPU cycles, where N is an implementation-specific integer 
in the range 1..16. The cycle counter frequency is the number of times the processor cycle 
counter gets incremented per second. The integer count wraps to 0 from a count of FFFF 
FFFF.. The counter wraps no more frequently than 1.5 times the implementation’s interval 


clock interrupt period (which is two thirds of the interval clock interrupt frequency), which 
guarantees that an interrupt occurs before PCC _CNT overflows twice. 


PCC_OFF need not contain a value related to time and could contain all zeros in a simple 
implementation. However, if PCC_OFF is used to calculate a per-process or per-thread cycle 
count, it must contain a value that, when added to PCC_CNT, returns the total PCC register 
count for that process or thread, modulo 2**32. 


Implementation Note: 
OpenVMS Alpha and DIGITAL UNIX supply a per-process value in PCC_OFF. 


PCC is required on all implementations. It is required for every processor, and each processor 
on a multiprocessor system has its own private, independent PCC. 


The PCC is read by the RPCC instruction. See Section 4.11.8. 


3.1.6 Optional Registers 


Some Alpha implementations may include optional memory prefetch or VAX compatibility 
processor registers. 


3.1.6.1 Memory Prefetch Registers 


If the prefetch instructions FETCH and FETCH_M are implemented, an implementation will 
include two sets of state prefetch registers used by those instructions. The use of these regis- 
ters is described in Section 4.11. These registers are not directly accessible by software and are 
listed for completeness. 


3.1.6.2 VAX Compatibility Register 


The VAX compatibility instructions RC and RS include the intr_flag register, as described in 
Section 4.12. 


3.2 Notation 


The notation used to describe the operation of each instruction is given as a sequence of con- 
trol and assignment statements in an ALGOL-like syntax. 
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3.2.1 Operand Notation 


Tables 3—1, 3-2, and 3-3 list the notation for the operands, the operand values, and the other 
expression operands. 


Table 3-1: Operand Notation 


Notation 


Ra 
Rb 
#b 
Re 
Fa 
Fb 
Fe 


Meaning 

An integer register operand in the Ra field of the instruction 

An integer register operand in the Rb field of the instruction 

An integer literal operand in the Rb field of the instruction 

An integer register operand in the Rc field of the instruction 

A floating-point register operand in the Ra field of the instruction 
A floating-point register operand in the Rb field of the instruction 


A floating-point register operand in the Rc field of the instruction 


Table 3-2: Operand Value Notation 


Notation 


Rav 
Rbv 


Fav 


Fbv 


Meaning 


The value of the Ra operand. This is the contents of register Ra. 


The value of the Rb operand. This could be the contents of register Rb, or 
a zero-extended 8-bit literal in the case of an Operate format instruction. 


The value of the floating point Fa operand. This is the contents of register 
Fa. . 


The value of the floating point Fb operand. This is the contents of register 
Fb 


Table 3-3: Expression Operand Notation 


Notation 
IPR_x 
IPR_SP[mode] 
PC 

Rn 

Fn 

X[m] 
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Meaning 

Contents of Internal Processor Register x) 

Contents of the per-mode stack pointer selected by mode 
Updated PC value 

Contents of integer register n 

Contents of floating-point register n 


Element m of array X 


DIGITAL Restricted Distribution 


3.2.2 Instruction Operand Notation 


The notation used to describe instruction operands follows from the operand specifier notation 
used in the VAX Architecture Standard. Instruction operands are described as follows: 


<name>.<access type><data type> 


3.2.2.1 Operand Name Notation 


Specifies the instruction field (Ra, Rb, Rc, or disp) and register type of the operand (integer or 
floating). It can be one of the following: 


Table 3-4: Operand Name Notation 


Name 
disp 
fne 

Ra 

Rb 

#b 

Re 

Fa 

Fb 

Fe 


Meaning 


The displacement field of the instruction 

The PALcode function field of the instruction 

An integer register operand in the Ra field of the instruction 

An integer register operand in the Rb field of the instruction 

An integer literal operand in the Rb field of the instruction 

An integer register operand in the Rc field of the instruction 

A floating-point register operand in the Ra field of the instruction 
A floating-point register operand in the Rb field of the instruction 


A floating-point register operand in the Rc field of the instruction 


3.2.2.2 Operand Access Type Notation 


A letter that denotes the operand access type: 


Table 3-5: Operand Access Type Notation 


Access Type 


a 


i 


Meaning 


The operand is used in an address calculation to form an effective 
address. The data type code that follows indicates the units of addressabil- 
ity (or scale factor) applied to this operand when the instruction is 
decoded. 


For example: 


"al" means scale by 4 (longwords) to get byte units (used in branch dis- 
placements); ".ab" means the operand is already in byte units (used in 
load/store instructions). 


The operand is an immediate literal in the instruction. 
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Table 3-5: Operand Access Type Notation (Continued) 


Access Type Meaning 


r The operand is read only. 
m The operand is both read and written. 
Ww The operand is write only. 


3.2.2.3 Operand Data Type Notation 


A letter that denotes the data type of the operand: 
Table 3-6: Operand Data Type Notation 


Data Type Meaning 


b Byte 

f F_floating 

g G_floating 

] Longword 

q Quadword 

S IEEE single floating (S_floating) 

t IEEE double floating (T_floating) 

Ww Word 

X The data type is specified by the instruction 


3.2.3, Operators 


Table 3-7 describes the operators: 
Table 3-7: Operators 


Operator Meaning 


! Comment delimiter 


+ Addition 

Subtraction 

: Signed multiplication 

*U Unsigned multiplication 

ae Exponentiation (left argument raised to right argument) 
/ Division 

- Replacement 
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Table 3-7: Operators (Continued) 


Operator Meaning 


il Bit concatenation 


{} Indicates explicit operator precedence 

(x) Contents of memory location whose address is x 

xX <m:n> Contents of bit field of x defined by bits n through m 

x <m> M’ th bit of x 

ACCESS(x,y) Accessibility of the location whose address is x using the 


access mode y. Returns a Boolean value TRUE if the 
address is accessible, else FALSE. 


AND Logical product 


ARITH_RIGHT_SHIFT(x,y) — Arithmetic right shift of first operand by the second oper- 
and. Y is an unsigned shift value. Bit 63, the sign bit, is 
copied into vacated bit positions and shifted out bits are 
discarded. 


BYTE_ZAP(x,y) X is a quadword, y is an 8-bit vector in which each bit 
corresponds to a byte of the result. The y bit to x byte cor- 
respondence is y <n><> x <8n+7:8n>. This correspon- 
dence also exists between y and the result. 


For each bit of y from n = 0 to 7, if y <n> is O then byte 
<n> of x is copied to byte <n> of result, and if y <n> is 1 
then byte <n> of result is forced to all zeros. 
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Table 3—7: Operators (Continued) 


Operator 


CASE 


DIV 
LEFT_SHIFT(x,y) 


MAP_x 
MAXS(x,y) 
es 
MINS(x,y) 
MINU(x,y) 


x MOD y 
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Meaning 


The CASE construct selects one of several actions based 
on the value of its argument. The form of a case is: 


CASE argument OF 
argvaluel: action_1l 
argvalue2: action 2 


argvaluen:action_n 
[otherwise: default _action] 
ENDCASE 


If the value of argument is argvaluel then action_1 is exe- 
cuted; if argument = argvalue2, then action_2 is executed, 
and so forth. 


Once a single action is executed, the code stream breaks 
to the ENDCASE (there is an implicit break as in Pascal). 
Each action may nonetheless be a sequence of 
pseudocode operations, one operation per line. 


Optionally, the last argvalue may be the atom ‘otherwise’. 
The associated default action will be taken if none of the 
other argvalues match the argument. 


Integer division (truncates) 


Logical left shift of first operand by the second operand. Y 
is an unsigned shift value. Zeros are moved into the 
vacated bit positions, and shifted out bits are discarded. 


The processor records the target physical address in a per- 
processor locked_physical_address register and sets the 


_per-processor lock_flag. 


Log to the base 2. 


F_float or S_float memory-to-register exponent mapping 
function. 


Returns the larger of x and y, with x and y interpreted as 
signed integers. . 


Returns the larger of x and y, with x and y interpreted as 
unsigned integers. 


Returns the smaller of x and y, with x and y interpreted as 
signed integers. 


Returns the smaller of x and y, with x and y interpreted as 
unsigned integers. 


x modulo y 
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Operator Meaning 

NOT Logical (ones) complement 

OR Logical sum 

PHYSICAL_ADDRESS Translation of a virtual address 

PRIORITY_ENCODE Returns the bit position of most significant set bit, inter- 
preting its argument as a positive integer (=int(lg(x))). For 
example: 


priority encode( 255 ) = 7 


Relational Operators: 


Operator Meaning 


LT Less than signed 

LTU Less than unsigned 

LE Less or equal signed 

LEU Less or equal unsigned 

EQ Equal signed and unsigned 

NE Not equal signed and unsigned 

GE Greater or equal signed | 

GEU Greater or equal unsigned 

GT Greater signed 

GTU Greater unsigned 

LBC Low bit clear 

LBS Low bit signed 
RIGHT_SHIFT(x,y) Logical right shift of first operand by the second operand. 


Y is an unsigned shift value. Zeros are moved into 
vacated bit positions, and shifted out bits are discarded. 


SEXT(x) X is sign-extended to the required size. 


STORE_CONDITIONAL If the lock_flag is set, then do the indicated store and clear 
the lock_flag. 
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Table 3~7: Operators (Continued) 


Operator Meaning 


TEST(x,cond) _ The contents of register x are tested for branch condition 


XOR 


(cond) true. TEST returns a Boolean value TRUE if x 
bears the specified relation to 0, else FALSE is returned. 
Integer and floating test conditions are drawn from the 
preceding list of relational operators. 


Logical difference 


ZEXT(x) X is zero-extended to the required size. 


3.2.4 Notation Conventions 


The following conventions are used: 


Only operands that appear on the left side of a replacement operator are modified. 


No operator precedence is assumed other than that replacement (<-) has the lowest pre- 
cedence. Explicit precedence is indicated by the use of "{ }". 


All arithmetic, logical, and relational operators are defined in the context of their oper- 
ands. For example, "+" applied to G_floating operands means a G_floating add, 
whereas "+" applied to quadword operands is an integer add. Similarly, "LT" is a 
G_floating comparison when applied to G_floating operands and an integer comparison 
when applied to quadword operands. 


3.3 Instruction Formats 


There are five basic Alpha instruction formats: 


Memory 

Branch 

Operate 
Floating-point Operate 
PALcode 


All instruction formats are 32 bits long with a 6-bit major opcode field in bits <31:26> of the 
instruction. 


Any unused register field (Ra, Rb, Fa, Fb) of an instruction must be set to a value of 31. 


Software Note: 


There are several instructions, each formatted as a memory instruction, that do not use the 
Ra and/or Rb fields. These instructions are: Memory Barrier, Fetch, Fetch_M, Read 
Process Cycle Counter, Read and Clear, Read and Set, and Trap Barrier. 
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3.3.1 Memory Instruction Format 


The Memory format is used to transfer data between registers and memory, to load an effec- 
tive address, and for subroutine jumps. It has the format shown in Figure 3-1. 


Figure 3-1: Memory Instruction Format 


31 26 25 2120 1615 0 


A Memory format instruction contains a 6-bit opcode field, two 5-bit register address fields, 
Ra and Rb, and a 16-bit signed displacement field. 


The displacement field is a byte offset. It is sign-extended and added to the contents of register 
Rb to form a virtual address. Overflow is ignored in this calculation. 


The virtual address is used as a memory load/store address or a result value, depending on the 
specific instruction. The virtual address (va) is computed as follows for all memory format 
instructions except the load address high (LDAH): 


va <— {Rbv + SEXT(Memory disp) } 
For LDAH the virtual address (va) is computed as follows: 
va < {Rbv + SEXT(Memory disp*65536) } 


3.3.1.1 Memory Format Instructions with a Function Code 


Memory format instructions with a function code replace the memory displacement field in 
the memory instruction format with a function code that designates a set of miscellaneous 
instructions. The format is shown in Figure 3-2. 


Figure 3-2: Memory Instruction with Function Code Format 


31 26 25 2120 1615 0 


ove me | me | tin 


The memory instruction with function code format contains a 6-bit opcode field and a 16-bit 
function field. Unused function codes produce UNPREDICTABLE but not UNDEFINED 
results; they are not security holes. 


There are two fields, Ra and Rb. The usage of those fields depends on the instruction. See Sec- 
tion 4.11. 
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3.3.1.2 Memory Format Jump Instructions 
For computed branch instructions (CALL, RET, JMP, JSR_COROUTINE) the displacement 


field is used to provide branch-prediction hints as described in Section 4.3. 


3.3.2 Branch Instruction Format 


The Branch format is used for conditional branch instructions and for PC-relative subroutine 
jumps. It has the format shown in Figure 3-3. 


Figure 3-3: Branch Instruction Format 


31 26 25 2120 0 


A Branch format instruction contains a 6-bit opcode field, one 5-bit register address field (Ra), 
and a 21-bit signed displacement field. 


The displacement is treated as a longword offset. This means it is shifted left two bits (to 
address a longword boundary), sign-extended to 64 bits, and added to the updated PC to form 
the target virtual address. Overflow is ignored in this calculation. The target virtual address 
(va) is computed as follows: 


va < PC + {4*SEXT(Branch_disp)} 
3.3.3 Operate Instruction Format 
The Operate format is used for instructions that perform integer register to integer register 
operations. The Operate format allows the specification of one destination operand and two 


source operands. One of the source operands can be a literal constant. The Operate format in 
Figure 3-4 shows the two cases when bit <12> of the instruction is 0 and 1. 


sisi 3-4: Operate Instruction Format 


26 25 2120 16151312 11 0 
ve a aes el 


26 25 2120 13 12 11 
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An Operate format instruction contains a 6-bit opcode field and a 7-bit function code field. . 
Unused function codes for opcodes defined as reserved in the Version 5 Alpha architecture 
specification (May 1992) produce an illegal instruction trap. Those opcodes are 01, 02, 03, 04, 
05, 06, 07, OA, OC, OD, OF, 14, 19, 1B, 1D, LE, and 1F. For other opcodes, unused function 
codes produce UNPREDICTABLE but not UNDEFINED results; they are not security holes. 


There are three operand fields, Ra, Rb, and Rc. 


The Ra field specifies a source operand. Symbolically, the integer Rav operand is formed as 
follows: 


IF inst<25:21> EQ 31 THEN 
Rav < 0 

ELSE 
Rav < Ra 

END 


The Rb field specifies a source operand. Integer operands can specify a literal or an integer 
register using bit <12> of the instruction. 


If bit <12> of the instruction is 0, the Rb field specifies a source register operand. 


If bit <12> of the instruction is 1, an 8-bit zero-extended literal constant is formed by bits 
<20:13> of the instruction. The literal is interpreted as a positive integer between 0 and 255 
and is zero-extended to 64 bits. Symbolically, the integer Rbv operand is formed as follows: 


IF inst <12> EQ 1 THEN 
Rbv < ZEXT(inst<20:13>) 
ELSE 
IF inst <20:16> EQ 31 THEN 
Rbv < 0 
ELSE 
Rbv <- Rb 
END 
END 


The Rc field specifies a destination operand. 


3.3.4 Floating-Point Operate Instruction Format 


The Floating-point Operate format is used for instructions that perform floating-point register 
to floating-point register operations. The Floating-point Operate format allows the specifica- 
tion of one destination operand and two source operands. The Floating-point Operate format is 
shown in Figure 3-5. 


Figure 3-5: Floating-Point Operate Instruction Format 


31 26 25 2120 16 15 5 4 0 
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A Floating-point Operate format instruction contains a 6-bit opcode field and an 11-bit func-, 
tion field. Unused function codes for those opcodes defined as reserved in the Version 5 Alpha 
architecture specification (May 1992) produce an illegal instruction trap. Those opcodes are 
01, 02, 03, 04, 05, 06, 07, 14, 19, 1B, 1D, 1E, and 1F. For other opcodes, unused function 
codes produce UNPREDICTABLE but not UNDEFINED results; they are not security holes. 


There are three operand fields, Fa, Fb, and Fc. Each operand field specifies either an integer or 
floating-point operand as defined by the instruction. 


The Fa field specifies a source operand. Symbolically, the Fav operand is formed as follows: 


IF inst<25:21> EQ 31 THEN 
Fav < 0 

ELSE 
Fav <— Fa 

END 


The Fb field specifies a source operand. Symbolically, the Fbv operand is formed as follows: 


IF inst<20:16> EQ 31 THEN 
Fbv < 0 

ELSE 
Fbv < Fb 

END 


Note: 


Neither Fa nor Fb can be a literal in Floating-point Operate instructions. 


The Fe field specifies a destination operand. 


3.3.4.1 Floating-Point Convert Instructions 


£fn~ 


Floating-point Convert instructions use a subset of the Floating-point Operate format and per- 
form register-to-register conversion operations. The Fb operand specifies the source; the Fa 
field must be F31. 


3.3.4.2 Floating-Point/Integer Register Moves 


Instructions that move data between a floating-point register file and an integer register file are 
a subset of of the Floating-point Operate format. The unused source field must be 31. 


3.3.5 PALcode Instruction Format 


The Privileged Architecture Library (PALcode) format is used to specify extended processor 
functions. It has the format shown in Figure 3-6. 
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Figure 3-6: PALcode Instruction Format 


31 26 25 0 


PALcode Function 


The 26-bit PALcode function field specifies the operation. The source and destination oper- 
ands for PALcode instructions are supplied in fixed registers that are specified in the 
individual instruction descriptions. 


An opcode of zero and a PALcode function of zero specify the HALT instruction. 
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3.4 \Revision History 


Revision 7.0, November 10, 1997 


1. 


2 
3. 
4 


Added ECO 90, MINS, MAXS, and MAXU operators 
Added ECO 95, change to R31/F31 exceptions with loads. 
Added ECO 81 

Alpha AXP —> Alpha 


Revision 6.0 December 1994 


1. 


2 
3: 
4. 
5 


Added eco 69 -- PCC register 

Added MAP_x operator 

Added ECO 61 ; Trap Unused Function Codes 
Alpha ——> Alpha AXP 


Remove second paragraph in Section 3.3.4.1 (clerical error) 


Revision 5.0, may 12, 1992 


ie 


ee SO ee oe Oe 


Removed references to SP and PS 

Added unsigned multiplication operator 

Added description of Fa, Fb registers if unused 

Converted to SDML 

Added Memory Format with Function Code section 

Moved Instruction Operand section from Chapter 4 

Edited description of R31 

Separated operand notation from operand value notation and simplified language 


Added comment and note to section 3.3 which specifies value assigned to unused regis- 
ter fields of instructions 


Revision 4.0, March 29, 1991 


1. 


Ee, Se Sr ee Oe 


Typos 

Upgrade description of R30 and implicit stack behavior of HW/PALcode 
Upgrade definition of byte_zap, access, left_shift, and right_shift operators 
Add definition of single bit field select operator, <n> 

Rename arith_shift operator to arith_right_shift and upgrade definition 
Make test a dyadic operator with explicit condition argument 

Define the CASE pseudocode construct 

Include Processor Status register in description of Alpha registers 


Add definitions of priority_encode and exponentiation (**) operators 
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10. Changed text describing R30 


11. Changed two relational operator mnemonics 


Revision 3.0, March 2, 1990 
1. Under registers, add lock registers, IPRs, and optional registers 
2. Define DIV, BYTE_ZAP, and PHYSICAL_ADDRESS; delete BYTE_SEL 
3. Delete reference to R28 


Revision 2.0, October 4, 1989 
1. Add comment to section on PC that PC is not an Integer Register 
2. Add comment that SP is R30 


3. Change description of L field in operate Instruction format 


Revision 1.0, May 23, 1989 
1. Remove Rb reading as PC for Rb eq 0 
2. Fix error in which bit is literal enable bit for operate format 


3. Add Floating-point Operate format 


Revision 0.0, March 15, 1989 


1. Initial version\ 
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Chapter 4 


Instruction Descriptions (I) 


4.1 Instruction Set Overview 


This chapter describes the instructions implemented by the Alpha architecture. The instruction 
set is divided into the following sections: 


Instruction Type Section 
Integer load and store 4.2 
Integer control 4.3 
Integer arithmetic 4.4 
Logical and shift 4.5 
Byte manipulation — 4.6 
Floating-point load and store 4.7 
Floating-point control 4.8 
Floating-point branch 4.9 
Floating-point operate 4.10 
Miscellaneous 4.11 
VAX compatibility 4.12 


Multimedia (graphics and video) 4.13 


Within each major section, closely related instructions are combined into groups and described 
together. 


The instruction group description is composed of the following: 


The group name 


The format of each instruction in the group, which includes the name, access type, and 
data type of each instruction operand 


The operation of the instruction 
Exceptions specific to the instruction 


The instruction mnemonic and name of each instruction in the group 
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Qualifiers specific to the instructions in the group 
A description of the instruction operation 


Optional programming examples and optional notes on the instruction 


4.1.1 Subsetting Rules 


An instruction that is omitted in a subset implementation of the Alpha architecture is not per- 
formed in either hardware or PALcode. System software may provide emulation routines for 
subsetted instructions. 


4.1.2 Floating-Point Subsets 


Floating-point support is optional on an Alpha processor. An implementation that supports 
floating-point must implement the following: 


The 32 floating-point registers 

The Floating-point Control Register (FPCR) and the instructions to access it 
The floating-point branch instructions 

The floating-point copy sign (CPYSx) instructions 

The floating-point convert instructions 

The floating-point Sonditional move instruction (FCMOV) 


The S_floating and T_floating memory operations 


Software Note: 


A system that will not support floating-point operations is still required to provide the 32 
floating-point registers, the Floating-point Control Register (FPCR) and the instructions to 
access it, and the T_floating memory operations if the system intends to support the 
OpenVMS Alpha operating system. This requirement facilitates the implementation of a 


f1, 
11U 


ting-point emulator and simplifies context-switching. 


In addition, floating-point support requires at least one of the following subset groups: 


1. 
we 


Note: 


VAX Floating-point Operate and Memory instructions (F_ and G_floating). 


IEEE Floating-point Operate instructions (S_ and T_floating). Within this group, an 
implementation can choose to include or omit separately the ability to perform IEEE 
rounding to plus infinity and minus infinity. 


If one instruction in a group is provided, all other instructions in that group must be 
provided. An implementation with full floating-point support includes both groups; a 
subset floating-point implementation supports only one of these groups. The individual 
instruction descriptions indicate whether an instruction can be subsetted. 
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4.1.3 Software Emulation Rules 


General-purpose layered and application software that executes in User mode may assume that 
certain loads (LDL, LDQ, LDF, LDG, LDS, and LDT) and certain stores (STL, STQ, STF, 
STG, STL, and STT) of unaligned data are emulated by system software. General-purpose lay- 
ered and application software that executes in User mode may assume that subsetted 
instructions are emulated by system software. Frequent use of emulation may be significantly 
slower than using alternative code sequences. 


Emulation of loads and stores of unaligned data and subsetted instructions need not be pro- 
vided in privileged access modes. System software that supports special-purpose dedicated 
applications need not provide emulation in User mode if emulation is not needed for correct 
execution of the special-purpose applications. 


4.1.4 Opcode Qualifiers 


Some Operate format and Floating-point Operate format instructions have several variants. 
For example, for the VAX formats, Add F_floating (ADDF) is supported with and without 
floating underflow enabled and with either chopped or VAX rounding. For IEEE formats, 
IEEE unbiased rounding, chopped, round toward plus infinity, and round toward minus infin- 
ity can be selected. 


The different variants of such instructions are denoted by opcode qualifiers, which consist of a 
slash (/) followed by a string of selected qualifiers. Each qualifier is denoted by a single char- 
acter as shown in Table 4-1. The opcodes for each qualifier are listed in Appendix C. 


Table 4-1: Opcode Qualifiers 


Qualifier Meaning 


Chopped rounding 

Rounding mode dynamic 
Round toward minus infinity 
Inexact result enable 
Exception completion enable 


Floating underflow enable 


ya a ec 


Integer overflow enable 


The default values are normal rounding, exception completion disabled, inexact result dis- 
abled, floating underflow disabled, and integer overflow disabled. 
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4.2 Memory Integer Load/Store Instructions 


The instructions in this section move data between the integer registers and memory. 


They use the Memory instruction format. The instructions are summarized in Table 4-2. 


Table 4-2: Memory Integer Load/Store Instructions 


Mnemonic 


LDA 
LDAH 


LDBU 
LDL 
LDL_L 
LDQ 
LDQ_L 
LDQ_U 
LDWU 


STB 
STL 
STLC 
STQ 
STQ_C 
STQ_U 
STW 
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Operation 


Load Address 
Load Address High 


Load Zero-Extended Byte from Memory to Register 
Load Sign-Extended Longword 

Load Sign-Extended Longword Locked 

Load Quadword 

Load Quadword Locked 

Load Quadword Unaligned 

Load Zero-Extended Word from Memory to Register 


Store Byte 

Store Longword 

Store Longword Conditional 
Store Quadword 

Store Quadword Conditional 
Store Quadword Unaligned 
Store Word 
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4.2.1 Load Address 


Format: 


LDAx Ra.wgq,disp.ab(Rb.ab) 


Operation: 


Ra < Rbv + SEXT(disp) 
Ra < Rbv + SEXT(disp*65536) 


Exceptions: 


None 


Instruction mnemonics: 


LDA Load Address 
LDAH Load Address High 
Qualifiers: 
None 
Description: 


!Memory format 


!LDA 
!LDAH 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment for LDA, and 65536 times the sign-extended 16-bit displacement for LDAH. The 64-bit 


result is written to register Ra. 
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4.2.2 Load Memory Data into Integer Register 
Format: 
LDx Ra.wgq,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp)} 


CASE 
big endian data: va’ < va XOR 0002 {LDQ 
big endian data: va’ < va XOR 100, !LDL 
big endian data: va’ < va XOR 110, !LDWU 
big endian data: va’ <- va XOR 111, ! LDBU 
little endian data: va’ <- va 
ENDCASE 
Ra < (va’ )<63:0> !LDQ 
Ra < SEXT((va’ )<31:0>) {LDL 
Ra < ZEXT((va’)<15:0>) !LDWU 
Ra ¢ ZEXT((va’)<07:0>) ! LDBU 
Exceptions: 
Access Violation 
Alignment 
Fault on Read 
Translation Not Valid 
Instruction mnemonics: 
LDBU Load Zero-Extended Byte from Memory to Register 
LDL Load Sign-Extended Longword from Memory to Register 
LDQ Load Quadword from Memory to Register 
LDWU Load Zero-Extended Word from Memory to Register 
Qualifiers: 
None 
Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian access, the indicated bits are inverted, and any memory management 
fault is reported for va (not va’ ). 
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In the case of LDQ and LDL, the source operand is fetched from memory, sign-extended, and 
written to register Ra. 


In the case of LDWU and LDBU, the source operand is fetched from memory, zero-extended, 
and written to register Ra. 


In all cases, if the data is not naturally aligned, an alignment exception is generated. 


Notes: 


¢ The word or byte that the LDWU or LDBU instruction fetches from memory is placed 
in the low (rightmost) word or byte of Ra, with the remaining 6 or 7 bytes set to zero. 


e Accesses have byte granularity. 


e For big-endian access with LDWU or LDBU, the word/byte remains in the rightmost 
part of Ra, but the va sent to memory has the indicated bits inverted. See Operation sec- 
tion, above. 


e No sparse address space mechanisms are allowed with the LDWU and LDBU instruc- 
tions. 


Implementation Notes: 


¢ The LDWU and LDBU instructions are supported in hardware on Alpha implementa- 
tions for which the AMASK instruction returns bit 0 set. LDWU and LDBU are sup- 
ported with software emulation in Alpha implementations for which AMASK does not 
return bit 0 set. Software emulation of LDWU and LDBU is significantly slower than 
hardware support. 


¢ Depending on an address space region’s caching policy, implementations may read a 
(partial) cache block in order to do word/byte stores. This may only be done in regions 
that have memory-like behavior. 


¢ Implementations are expected to provide sufficient low-order address bits and 
length-of-access information to devices on I/O buses. But, strictly speaking, this is out- 
side the scope of architecture. 
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4.2.3 Load Unaligned Memory Data into Integer Register 


Format: 
LDQ_U Ra.wgq,disp.ab(Rb.ab) !Memory format 


Operation: 


va < {{Rbv + SEXT(disp)} AND NOT 7} 
Ra < (va)<63:0> 


Exceptions: 


Access Violation 
Fault on Read 
Translation Not Valid 


Instruction mnemonics: 


LDQ_U Load Unaligned Quadword from Memory to Register 


Qualifiers: 


None 


Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment, then the low-order three bits are cleared. The source operand is fetched from memory 
and written to register Ra. 
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4.2.4 Load Memory Data into Integer Register Locked 
Format: 
LDx_L Ra.wq,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp)} 


CASE 
big endian data: va’ < va XOR 0002 ! LDOL 
big endian data: va’ < va XOR 100, ! LDL L 
little endian data: va’ <- va ! LDL L 
ENDCASE 


lock_flag < 1 
locked physical address <- PHYSICAL ADDRESS(va) 


Ra < SEXT( (va! )<31:0>) ! LDL L 
Ra ¢< (va)<63:0> ! LDQ L 
Exceptions: 


Access Violation 
Alignment 

Fault on Read 
Translation Not Valid 


Instruction mnemonics: 


LDL_L Load Sign-Extended Longword from Memory to Register 
Locked . 
LDQ_L Load Quadword from Memory to Register Locked 
Qualifiers: 
None 
Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian longword access, va<2> (bit 2 of the virtual address) is inverted, and 
any memory management fault is reported for va (not va’). The source operand is fetched 
from memory, sign-extended for LDL_L, and written to register Ra. 
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When a LDx_L instruction is executed without faulting, the processor records the target physi- 
cal address in a per-processor locked_physical_address register and sets the per-processor 
lock_flag. 


If the per-processor lock_flag is (still) set when a STx_C instruction is executed (accessing 
within the same 16-byte naturally aligned block as the LDx_L), the store occurs; otherwise, it 
does not occur, as described for the STx_C instructions. The behavior of an STx_C instruction 
is UNPREDICTABLE, as described in Section 4.2.5, when it does not access the same 
16-byte naturally aligned block as the LDx_L. 


Processor A causes the clearing of a set lock_flag in processor B by doing any of the following 
in B’s locked range of physical addresses: a successful store, a successful store_conditional, or 
executing a WH64 instruction that modifies data on processor B. A processor’s locked range is 
the aligned block of 2**N bytes that includes the locked_physical_address. The 2**N value is 
implementation dependent. It is at least 16 (minimum lock range is an aligned 16-byte block) 
and is at most the page size for that implementation (maximum lock range is one physical 


page). 


A processor’s lock_flag is also cleared if that processor encounters a CALL_PAL REI, 
CALL_PAL rti, or CALL_PAL rfe instruction. It is UNPREDICTABLE whether or not a pro- 
cessor’s lock_flag is cleared on any other CALL_PAL instruction. It is UNPREDICTABLE 
whether a processor’s lock_flag is cleared by that processor executing a normal load or store 
instruction. It is UNPREDICTABLE whether a processor’s lock_flag is cleared by that proces- 
sor executing a taken branch (including BR, BSR, and Jumps); conditional branches that fall 
through do not clear the lock_flag. It is UNPREDICTABLE whether a processor’s lock_flag 
is cleared by that processor executing a WH64 or ECB instruction. 


The sequence: 


LDx_L 
Modify 
STx_C 
BEQ xxx 


when executed on a given processor, does an atomic read-modify-write of a datum in shared 
memory if the branch falls through. If the branch is taken, the store did not modify memory 
and the sequence may be repeated until it succeeds. 


Notes: 


e LDx_L instructions do not check for write access; hence a matching STx_C may take 
an access-violation or fault-on-write exception. 


Executing a LDx_L instruction on one processor does not affect any architecturally 
visible state on another processor, and in particular cannot cause an STx_C on another 
processor to fail. 


LDx_L and STx_C instructions need not be paired. In particular, an LDx_L may be 
followed by a conditional branch: on the fall-through path an STx_C is executed, 
whereas on the taken path no matching STx_C is executed. 
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If two LDx_L instructions execute with no intervening STx_C, the second one 
overwrites the state of the first one. If two STx_C instructions execute with no 
intervening LDx_L, the second one always fails because the first clears lock_flag. 


e =Software will not emulate unaligned LDx_L instructions. 


e If the virtual and physical addresses for a LDx_L and STx_C sequence are not within 
the same naturally aligned 16-byte sections of virtual and physical memory, that 
sequence may always fail, or may succeed despite another processor’s store to the lock 
range; hence, no useful program should do this. 


e If any other memory access (ECB, LDx, LDQ_U, STx, STQ_U, WH64) is executed on 
the given processor between the LDx_L dnd the STx_C, the sequence above may 
always fail on some implementations; hence, no useful program should do this. 


e If a branch is taken between the LDx_L and the STx_C, the sequence above may 
always fail on some implementations; hence, no useful program should do this. 
(CMOVxx may be used to avoid branching.) 


e Ifa subsetted instruction (for example, floating-point) is executed between the LDx_L 
and the STx_C, the sequence above may always fail on some implementations because 
of the Illegal Instruction Trap; hence, no useful program should do this. 


e =-Tf an instruction with an unused function code is executed between the LDx_L and the 
STx_C, the sequence above may always fail on some implementations because an 
instruction with an unused function code is UNPREDICTABLE. 


e  =If a large number of instructions are executed between the LDx_L and the STx_C, the 
sequence above may always fail on some implementations because of a timer interrupt 
always clearing the lock_flag before the sequence completes; hence, no useful program 
should do this. 


e Hardware implementations are encouraged to lock no more than 128 bytes. Software 
implementations are encouraged to separate locked locations by at least 128 bytes from 
other locations that could potentially be written by another processor while the first 
location is locked. 


e Execution of a WH64 instruction on processor A to a region within the lock range of 
processor B, where the execution of the WH64 changes the contents of memory, causes 
the lock_flag on processor B to be cleared. If the WH64 does not change the contents of 
memory on processor B, it need not clear the lock_flag. 


Implementation Notes: 


Implementations that impede the mobility of a cache block on LDx_L, such as that which 
may occur in a Read for Ownership cache coherency protocol, may release the cache 
block and make the subsequent STx_C fail if a branch-taken or memory instruction is 
executed on that processor. 


All implementations should guarantee that at least 40 non-subsetted operate instructions 
can be executed between timer interrupts. 
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4.2.5 Store Integer Register Data into Memory Conditional 
Format: 
STx_C Ra.mx,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp)} 


CASE 

big endian data: va’ < va XOR 000) ! sto Cc 
big endian data: va’ < va XOR 100, { STL C 
little_endian data: va’ < va ! STL_C 
ENDCASE 


IF lock_flag EQ 1 THEN 
(va’)<31:0> < Rav<31:0> ! STL_C 
(va’) -< Rav ! sTQC 
Ra < lock_flag 
lock flag < 0 


Exceptions: 


Access Violation 
Fault on Write 
Alignment 


Translation Not Valid 


Instruction mnemonics: 


STL_C Store Longword from Register to Memory Conditional 
STQ_C Store Quadword from Register to Memory Conditional 
Qualifiers: 
None 
Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian longword access, va<2> (bit 2 of the virtual address) is inverted, and 
any memory management fault is reported for va (not va’ ). 


If the lock_flag is set and the address meets the following constraints relative to the address 
specified by the preceding LDx_L instruction, the Ra operand is written to memory at this 
address. If the address meets the following constraints but the lock_flag is not set, a zero is 
returned in Ra and no write to memory occurs. The constraints are: 
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e The computed virtual address must specify a location within the naturally aligned 
16-byte block in virtual memory accessed by the preceding LDx_L instruction. 


e The resultant physical address must specify a location within the naturally aligned 
16-byte block in physical memory accessed by the preceding LDx_L instruction. 


If those addressing constraints are not met, itis UNPREDICTABLE whether the STx_C 
instruction succeeds or fails, regardless of the state of the lock_flag, unless the lock_flag is 
cleared as described in the next paragraph. 


Whether or not the addressing constraints are met, a zero is returned and no write to memory 
occurs if the lock_flag was cleared by execution on a processor of a CALL_PAL REI, 
CALL_PAL rti, CALL_PAL rfe, or STx_C, after the most recent execution on that processor 
of a LDx_L instruction (in processor issue sequence). 


In all cases, the lock_flag is set to zero at the end of the operation. 


Notes: 
e §©Software will not emulate unaligned STx_C instructions. 


¢ Each implementation must do the test and store atomically, as illustrated in the follow- 
ing two examples. (See Section 5.6.1 for complete information.) 


— If two processors attempt STx_C instructions to the same lock range and that lock 
range was accessed by both processors’ preceding LDx_L instructions, exactly one 
of the stores succeeds. 


— A processor executes a LDx_L/STx_C sequence and includes an MB between the 
LDx_L to a particular address and the successful STx_C to a different address (one 
that meets the constraints required for predictable behavior). That instruction 
sequence establishes an access order under which a store operation by another pro- 
cessor to that lock range occurs before the LDx_L or after the STx_C. 


e If the virtual and physical addresses for a LDx_L and STx_C sequence are not within 
the same naturally aligned 16-byte sections of virtual and physical memory, that 
sequence may always fail, or may succeed despite another processor’s store to the lock 
range; hence, no useful program should do this. 


e The following sequence should not be used: 


try again: LDQL Ri, x 


<modify R1> 
sTQCc RI, x 
BEQ Rl, try_again 


That sequence penalizes performance when the STQ_C succeeds, because the 
sequence contains a backward branch, which is predicted to be taken in the Alpha 
architecture. In the case where the STQ_C succeeds and the branch will actually fall 
through, that sequence incurs unnecessary delay due to a mispredicted backward 

- branch. Instead, a forward branch should be used to handle the failure case, as shown 
in Section 5.5.2. 
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Software Note: 


If the address specified by a STx_C instruction does not match the one given in the 
preceding LDx_L instruction, an MB is required to guarantee ordering between the two 
instructions. 


Hardware/Software Implementation Note: 


STQ_C is used in the first Alpha implementations to access the MailBox Pointer Register 
(MBPR). In this special case, the effect of the STQ_C is well defined (that is, not 
UNPREDICTABLE) even though the preceding LDx_L did not specify the address of the 
MBPR. The effect of STx_C in this special case may vary from implementation to 
implementation. 


Implementation Notes: 


A STx_C must propagate to the point of coherency, where it is guaranteed to prevent any 
other store from changing the state of the lock bit, before its outcome can be determined. 


If an implementation could encounter a TB or cache miss on the data reference of the 
STx_C in the sequence above (as might occur in some shared I- and D-stream 
direct-mapped TBs/caches), it must be able to resolve the miss and complete the store 
without always failing. 
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4.2.6 Store Integer Register Data into Memory 
Format: 
STx Ra.rx,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp)} 


CASE 
big endian data: va’ < va XOR 000, !STQ 
big endian data: va’ < va XOR 100, !STL 
big endian data: va’ <— va XOR 110, !STW 
big endian data: va’ < va XOR 111, !STB 
little endian data: va’ < va 
ENDCASE 
(va’) < Rav !STQ 
(va! )<31:00> < Rav<31:0> !STL 
(va’)<15:00> < Rav<15:0> !STW 
(va’ )<07:00> < Rav<07:0> !STB 
Exceptions: 
Access Violation 
Alignment 
Fault on Write 
Translation Not Valid 
Instruction mnemonics: 
STB Store Byte from Register to Memory 
STL Store Longword from Register to Memory 
STQ Store Quadword from Register to Memory 
STW Store Word from Register to Memory 
Qualifiers: 
None 
Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian access, the indicated bits are inverted, and any memory management 
fault is reported for va (not va’). 
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The Ra operand is written to memory at this address. If the data is not naturally aligned, an 
alignment exception is generated. 


Notes: 


e The word or byte that the STB or STW instruction stores to memory comes from the 
low (rightmost) byte or word of Ra. 


e Accesses have byte granularity. 


e For big-endian access with STB or STW, the byte/word remains in the rightmost part of 
Ra, but the va sent to memory has the indicated bits inverted. See Operation section, 
above. 


e No sparse address space mechanisms are allowed with the STB and STW instructions. 


Implementation Notes: 


e The STB and STW instructions are supported in hardware on Alpha implementations 
for which the AMASK instruction returns bit 0 set. STB and STW are supported with 
software emulation in Alpha implementations for which AMASK does not return bit 0 
set. Software emulation of STB and STW is significantly slower than hardware support. 


e Depending on an address space region’s caching policy, implementations may read a 
(partial) cache block in order to do byte/word stores. This may only be done in regions 
that have memory-like behavior. 


¢ Implementations are expected to provide sufficient low-order address bits and 
length-of-access information to devices on I/O buses. But, strictly speaking, this is out- 
side the scope of architecture. 
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4.2.7 Store Unaligned Integer Register Data into Memory 


Format: 


STQ_U Ra.rq,disp.ab(Rb.ab) !Memory format 


Operation: 


va < {{Rbv + SEXT(disp)} AND NOT 7} 
(va)<63:0> < Rav<63:0> 


Exceptions: 


Access Violation 
Fault on Write 


Translation Not Valid 


Instruction mnemonics: 


STQ_U Store Unaligned Quadword from Register to Memory 


Qualifiers: 


None 


Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment, then clearing the low order three bits. The Ra operand is written to memory at this 
address. 
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4.3 Control Instructions 


Alpha provides integer conditional branch, unconditional branch, branch to subroutine, and 
jump instructions. The PC used in these instructions is the updated PC, as described in Section 
3.1.1. 


To allow implementations to achieve high performance, the Alpha architecture includes 
explicit hints based on a branch-prediction model: 


e For many implementations of computed branches (JSR/RET/JMP), there is a substan- 
tial performance gain in forming a good guess of the expected target I-cache address 
before register Rb is accessed. 


¢ For many implementations, the first-level (or only) I-cache is no bigger than a page (8 
KB to 64 KB). 


¢ Correctly predicting subroutine returns is important for good performance. Some 
implementations will therefore keep a small stack of predicted subroutine return 
I-cache addresses. 


The Alpha architecture provides three kinds of branch-prediction hints: likely target address, 
return-address stack action, and conditional branch-taken. 


For computed branches, the otherwise unused displacement field contains a function code 
(JMP/JSR/RET/JSR_COROUTINE), and, for JSR and JMP, a field that statically specifies the 
16 low bits of the most likely target address. The PC-relative calculation using these bits can 
be exactly the PC-relative calculation used in unconditional branches. The low 16 bits are 
enough to specify an I-cache block within the largest possible Alpha page and hence are 
expected to be enough for branch-prediction logic to start an early I-cache access for the most 
likely target. 


For all branches, hint or opcode bits are used to distinguish simple branches, subroutine calls, 
subroutine returns, and coroutine links. These distinctions allow branch-predict logic to main- 


tain an accurate stack of predicted return addresses. 


For conditional branches, the sign of the target displacement is used as a taken/fall-through 
hint. The instructions are summarized in Table 4-3. 


Table 4-3: Control Instructions Summary 


Mnemonic Operation 

BEQ Branch if Register Equal to Zero 

BGE Branch if Register Greater Than or Equal to Zero 

BGT Branch if Register Greater Than Zero 

BLBC Branch if Register Low Bit Is Clear 

BLBS Branch if Register Low Bit Is Set 

BLE Branch if Register Less Than or Equal to Zero 
-BLT Branch if Register Less Than Zero 


DIGITAL Restricted Distribution 


4-18 Common Architecture (I) 


Table 4-3: Control Instructions Summary (Continued) 


Mnemonic Operation 

BNE Branch if Register Not Equal to Zero 
BR Unconditional Branch 

BSR Branch to Subroutine 

JMP Jump 

JSR | Jump to Subroutine 

RET Return from Subroutine 
JSR_COROUTINE Jump to Subroutine Return 
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4.3.1 Conditional Branch 


Format: 
Bxx Ra.rq,disp.al !Branch format 


Operation: 


{update PC} 

va < PC + {4*SEXT(disp) } 

IF TEST(Rav, Condition_based_on_Opcode) THEN 
PC < va 


Exceptions: 


None 


Instruction mnemonics: 


BEQ Branch if Register Equal to Zero 
BGE Branch if Register Greater Than or Equal to Zero 
BGT Branch if Register Greater Than Zero 
BLBC Branch if Register Low Bit Is Clear 
BLBS Branch if Register Low Bit Is Set 
BLE Branch if Register Less Than or Equal to Zero 
BLT Branch if Register Less Than Zero 
BNE Branch if Register Not Equal to Zero 
Qualifiers: 
None 
Description: 


Register Ra is tested. If the specified relationship is true, the PC is loaded with the target vir- 
tual address; otherwise, execution continues with the next sequential instruction. 


The displacement is treated as a signed longword offset. This means it is shifted left two bits 
(to address a longword boundary), sign-extended to 64 bits, and added to the updated PC to 
form the target virtual address. 


The conditional branch instructions are PC-relative only. The 21-bit signed displacement gives 
a forward/backward branch distance of +/— 1M instructions. 


The test is on the signed quadword integer interpretation of the register contents; all 64 bits are 
tested. 
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4.3.2 Unconditional Branch 


Format: 
BxR Ra.wq,disp.al !Branch format 


Operation: 


{update PC} 
Ra < PC 
PC < PC + {4*SEXT(disp) } 


Exceptions: 


None 


Instruction mnemonics: 


BR Unconditional Branch 
BSR Branch to Subroutine 
Qualifiers: 
None 
Description: 


The PC of the following instruction (the updated PC) is written to register Ra and then the PC 
is loaded with the target address. 


The displacement is treated as a signed longword offset. This means it is shifted left two bits 
(to address a longword boundary), sign-extended to 64 bits, and added to the updated PC to 
form the target virtual address. 


The unconditional branch instructions are PC-relative. The 21-bit signed displacement gives a 
forward/backward branch distance of +/— 1M instructions. 


PC-relative addressability can be established by: 


BR Rx,L1 
Ll: 


Notes: 


¢ BR.and BSR do identical operations. They only differ in hints to possible branch-pre- 
diction logic. BSR is predicted as a subroutine call (pushes the return address on a 
branch-prediction stack), whereas BR is predicted as a branch (no push). 
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4.3.3 Jumps 


Format: 


mnemonic 


Operation: 
{update PC} 


Ra.wq,(Rb.ab),hint 


va < Rbv AND {NOT 3} 


Ra <— PC 
PC < va 


Exceptions: 


None 


Instruction mnemonics: 


JMP 

JSR 

RET 
JSR_COROUTINE 


Qualifiers: 


None 


Description: 


Jump 
Jump to Subroutine 
Return from Subroutine 


Jump to Subroutine Return 


The PC of the instruction following the Jump instruction (the updated PC) is 
Ra and then the PC is loaded with the target virtual address. 


{Memory format 


written to re 


The new PC is supplied from register Rb. The low two bits of Rb are ignored. Ra and Rb may 
specify the same register; the target calculation using the old value is done before the new 


value is assigned. 


All Jump instructions do identical operations. They only differ in hints to possible branch-pre- 
diction logic. The displacement field of the instruction is used to pass this information. The 
four different "opcodes" set different bit patterns in disp<15:14>, and the hint operand sets 


disp<13:0>. 


These bits are intended to be used as shown in Table 4-4. 
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Table 4—4: Jump Instructions Branch Prediction 


disp<1S:14> Meaning reetctsi> Stack Action 
00 JMP PC + {4*disp<13:0>} - 

01 JSR PC + {4*disp<13:0>} | Push PC 

10 RET Prediction stack Pop 

11 JSR_COROUTINE Prediction stack Pop, push PC 


The design in Table 4—4 allows specification of the low 16 bits of a likely longword target 
address (enough bits to start a useful I-cache access early), and also allows distinguishing call 
from return (and from the other two less frequent operations). 


Note that the above information is used only as a hint; correct setting of these bits can improve 
performance but is not needed for correct operation. See Appendix A for more information on 
branch prediction. 


An unconditional long jump can be performed by: 
JMP R31, (Rb) ,hint 


Coroutine linkage can be performed by specifying the same register in both the Ra and Rb 

operands. When disp<15:14> equals ‘10’ (RET) or ‘11’ (JSR_COROUTINE) (that is, the tar- 
get address prediction, if any, would come from a predictor implementation stack), then bits 
<13:0> are reserved for software and must be ignored by all implementations. All encodings 
for bits <13:0> are used by DIGITAL software or Reserved to DIGITAL, as follows: 


Encoding Meaning 
00001 ¢ Indicates non-procedure return 
000116 Indicates procedure return 


All other encodings are reserved to DIGITAL. 
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4.4 Integer Arithmetic Instructions 


The integer arithmetic instructions perform add, subtract, multiply, signed and unsigned com- 
pare, and bit count operations. 


The integer instructions are summarized in Table 4-5 


Table 4-5: Integer Arithmetic Instructions Summary 


Mnemonic Operation 

ADD Add Quadword/Longword 

S4ADD Scaled Add by 4 

S8ADD Scaled Add by 8 

CMPEQ Compare Signed Quadword Equal 

CMPLT Compare Signed Quadword Less Than 

CMPLE Compare Signed Quadword Less Than or Equal 
CTLZ Count leading zero 

CTPOP Count population 

CTTZ Count trailing zero 

CMPULT Compare Unsigned Quadword Less Than 
CMPULE Compare Unsigned Quadword Less Than or Equal 
MUL Multiply Quadword/Longword 

UMULH Multiply Quadword Unsigned High 

SUB Subtract Quadword/Longword 

S4SUB Scaled Subtract by 4 

S8SUB Scaled Subtract by 8 


There is no integer divide instruction. Division by a constant can be done by using UMULH; 
division by a variable can be done by using a subroutine. See Appendix A. 
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4.4.1 Longword Add 


Format: 
ADDL Ra.rl,Rb.rl,Rc.wq | 
ADDL Ra.rl,#b.ib,Rc.wq 
Operation: 


Rc < SEXT( (Rav + Rbv)<31:0>) 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


ADDL | Add Longword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


!Operate format 


!Operate format 


Register Ra is added to register Rb or a literal and the sign-extended 32-bit sum is written to 


Re. 


The high order 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 
32-bit sum. Overflow detection is based on the longword sum Rav<31:0> + Rbv<31:0>. 
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4.4.2 Scaled Longword Add 


Format: 
SxADDL Ra.rl,Rb.rg,Rc.wq !Operate format 
SxADDL Ra.rl,#b.ib,Rc.wq !Operate format 
Operation: 
CASE 


S4ADDL: Re < SEXT (( (LEFT _SHIFT(Rav,2)) + Rbv)<31:0>) 
S8ADDL: Rc < SEXT (((LEFT SHIFT(Rav,3)) + Rbv)<31:0>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


S4ADDL Scaled Add Longword by 4 
S8ADDL Scaled Add Longword by 8 
Qualifiers: 
None 
Description: 


Register Ra is scaled by 4 (for S4ADDL) or 8 (for SSADDL) and is added to register Rb ora 
literal, and the sign-extended 32-bit sum is written to Rc. 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 
32-bit sum. 
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4.4.3 Quadword Add 


Format: 
ADDQ Ra.rq,Rb.rq,Rc.wq 
ADDQ Ra.rg,#b.ib,Rc.wq 
Operation: 


Rc €< Rav + Rbv 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


ADDQ Add Quadword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


!Operate format 


{Operate format 


Register Ra is added to register Rb or a literal and the 64-bit sum is written to Rc. 


On overflow, the least significant 64 bits of the true result are written to the destination 


register. 


The unsigned compare instructions can be used to generate carry. After adding two values, if 
the sum is less unsigned than either one of the inputs, there was a carry out of the most signifi- 


cant bit. 
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4.4.4 Scaled Quadword Add 


Format: 
. SxADDQ Ra.rq,Rb.rq,Rc.wq !Operate format 
SxADDQ Ra.rq,#b.ib,Rc.wq !Operate format 
Operation: 
CASE 


S4ADDQ: Rc <— LEFT SHIFT(Rav,2) + Rbv 


S8ADDQ: Roe < LEFT SHIFT(Rav,3) 
ENDCASE 


Exceptions: 


None 


_ Instruction mnemonics: 


S4ADDQ 
S8ADDQ 


Qualifiers: 


None 


Description: 


+ 


Rbv 


Scaled Add Quadword by 4 
Scaled Add Quadword by 8 


Register Ra is scaled by 4 (for S¢ADDQ) or 8 (for SSADDQ) and is added to register Rb or a 


literal, and the 64-bit sum is written to Rc. 


On overflow, the least significant 64 bits of the true result are written to the destination 


register. 
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4.4.5 Integer Signed Compare 


Format: 


CMPxx Ra.rg,Rb.rg,Rc.wq !Operate format 


CMPxx | Ra.rgq,#b.ib,Rc.wq !Operate format 


Operation: 


IF Rav SIGNED_RELATION Rbv THEN 
Re <— 1 

ELSE 

Rc < 0 


Exceptions: 


None 


Instruction mnemonics: 


CMPEQ Compare Signed Quadword Equal 
CMPLE Compare Signed Quadword Less Than or Equal 
CMPLT Compare Signed Quadword Less Than 
Qualifiers: 
None 
Description: 


Register Ra is compared to Register Rb or a literal. If the specified relationship is true, the 
value one is written to register Rc; otherwise, zero is written to Rc. 


Notes: 


¢ Compare Less Than A,B is the same as Compare Greater Than B,A; Compare Less 
Than or Equal A,B is the same as Compare Greater Than or Equal B,A. Therefore, only 
the less-than operations are included. 
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4.4.6 Integer Unsigned Compare 


Format: 


CMPUxx Ra.rq,Rb.rq,Rc.wq !Operate format 


CMPUxx Ra.rq,#b.ib,Rc.wq !Operate format 


Operation: 


IF Rav UNSIGNED RELATION Rbv THEN 
Re < 1 

ELSE 
Re < 0 


Exceptions: 


None 


Instruction mnemonics: 


CMPULE Compare Unsigned Quadword Less Than or Equal 
CMPULT Compare Unsigned Quadword Less Than 
Qualifiers: 
None 
Description: 


Register Ra is compared to Register Rb or a literal. If the specified relationship is true, the 
value one is written to register Rc; otherwise, zero is written to Rc. 
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4.4.7 Count Leading Zero 


Format: 
CTLZ Rb.rq,Rc.wq ! Operate format 


Operation: 


temp = 0 

FOR i FROM 63 DOWN TO 0 

IF { Rbv<i> EQ 1 } THEN BREAK 
temp = temp + 1 
END 

Rc<6:0> < temp<6:0> 

Rc<63:7> < 0 


Exceptions: 


None 


Instruction mnemonics: 


CTLZ Count Leading Zero 


Qualifiers: 


None 


Description: 


The number of leading zeros in Rb, starting at the most significant bit position, is written to 
Rc. Ra must be R31. 
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4.4.8 Count Population 


Format: 


CTPOP Rb.rq,Rc.wq ! Operate format 


Operation: 
temp = 0 
FOR i FROM 0 TO 63 
IF { Rbv<i> EQ 1 } THEN temp = temp + 1 
END 
Rc<6:0> < temp<6:0> 
Rce<63:7> < 0 


Exceptions: 


None 


Instruction mnemonics: 


CTPOP Count Population 


Qualifiers: 


None 


Description: 


The number of ones in Rb is written to Rc. Ra must be R31. 
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4.4.9 Count Trailing Zero 


Format: 
CTTZ Rb.rq,Rc.wq ! Operate format 


Operation: 


temp = 0 

FOR i FROM 0 TO 63 
IF { Rbv<i> EQ 1 } THEN BREAK 
temp = temp + 1 

END 

Rc<6:0> < temp<6:0> 

Re<63:7> < 0 


Exceptions: 


None 


Instruction mnemonics: 


CTTZ Count Trailing Zero 


Qualifiers: 


None 


Description: 


The number of trailing zeros in Rb, starting at the least significant bit position, is written to 
Rc. Ra must be R31. 
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4.4.10 Longword Multiply 


Format: 
MULL Ra.rl,Rb.rl,Rc.wq !Operate format 
MULL Ra.rl,#b.ib,Rc.wq !Operate format 
Operation: 


Re < SEXT ((Rav * Rbv)<31:0>) 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


MULL Multiply Longword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Ra is multiplied by register Rb or a literal and the sign-extended 32-bit product is 
written to Rc. 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 
32-bit product. Overflow detection is based on the longword product 
Rav<31:0> * Rbv<31:0>. On overflow, the proper sign extension of the least significant 32 
bits of the true result is written to the destination register. 


The MULQ instruction can be used to return the full 64-bit product. 
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4.4.11 Quadword Multiply 


Format: 
MULQ Ra.rq,Rb.rg,Rc.wq !Operate format 
MULQ Ra.Rq,#b.ib,Rc.wq !Operate format 
Operation: 


Rc < Rav * Rbv 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


MULQ Multiply Quadword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Ra is multiplied by register Rb or a literal and the 64-bit product is written to register 
Rc. Overflow detection is based on considering the operands and the result as signed quanti- 
ties. On overflow, the least significant 64 bits of the true result are written to the destination 
register. 


The UMULH instruction can be used to generate the upper 64 bits of the 128-bit result when 
an overflow occurs. 
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4.4.12 Unsigned Quadword Multiply High 


Format: 
UMULH Ra.rq,Rb.rg,Rc.wq {Operate format 
UMULH Ra.rg,#b.ib,Rc.wq !Operate format 
Operation: 


Re < {Rav * U Rbv}<127:64> 


Exceptions: 


None 


Instruction mnemonics: 


UMULH Unsigned Multiply Quadword High 


Qualifiers: 


None 


Description: 


Register Ra and Rb or a literal are multiplied as unsigned numbers to produce a 128-bit result. 
The high-order 64-bits are written to register Rc. 


The UMULH instruction can be used to generate the upper 64 bits of a 128-bit result as 
follows: . 


Ra and Rb are unsigned: result of VMULH 
Ra and Rb are signed: (result of UMULH) — Ra<63>*Rb — Rb<63>*Ra 


The MULQ instruction gives the low 64 bits of the result in either case. 
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4.4.13 Longword Subtract 


Format: 
SUBL Ra.rl,Rb.rl,Rce.wq !Operate format 
SUBL Ra.rl,#b.ib,Rc.wq !Operate format 
Operation: 


Re < SEXT ((Rav - Rbv)<31:0>) 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


SUBL ~ Subtract Longword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Rb or a literal is subtracted from register Ra and the sign-extended 32-bit difference 
is written to Rc. 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 
32-bit difference. Overflow detection is based on the longword difference 
Rav<31:0> — Rbv<31:0>. 
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4.4.14 Scaled Longword Subtract 


Format: 


SxSUBL Ra.rl,Rb.rl,Rc.wq !Operate format 
SxSUBL | Ra.rl,#b.ib,Rc.wq !Operate format 


Operation: 


CASE 
S4SUBL: Re < SEXT (((LEFT SHIFT(Rav,2)) - Rbv)<31:0>) 
S8SUBL: Rc < SEXT (((LEFT SHIFT(Rav,3)) — Rbv)<31:0>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


S4SUBL Scaled Subtract Longword by 4 
S8SUBL Scaled Subtract Longword by 8 
Qualifiers: 
None 
Description: 


Register Rb or a literal is subtracted from the scaled value of register Ra, which is scaled by 4 
(fo r S4SUB L) or 8 (for S8Su IBL ), and the sion-exter nded 39- bit at difference i is written t. to Rc, : 


C1404 LAAY Vioti* ws 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the truncated 
32-bit difference. 


DIGITAL Restricted Distribution 


4-38 Common Architecture (I) 


! 


4.4.15 Quadword Subtract 


Format: 
SUBQ Ra.rq,Rb.rq,Rc.wq {Operate format 
SUBQ Ra.rq,#b.ib,Rc.wq !Operate format 
Operation: 


Re ¢< Rav — Rbv 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


SUBQ Subtract Quadword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Rb or a literal is subtracted from register Ra and the 64-bit difference is written to reg- 
ister Rc. On overflow, the least significant 64 bits of the true result are written to the 
destination register. 


The unsigned compare instructions can be used to generate borrow. If the minuend (Rav) is 
less unsigned than the subtrahend (Rbv), a borrow will occur. 
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4.4.16 Scaled Quadword Subtract 


Format: 
SxSUBQ Ra.rq,Rb.rq,Rc.wq !Operate format 
SxSUBQ Ra.rq,#b.ib,Rc.wq !Operate format 
Operation: 
CASE 


S4SUBQ: Rc < LEFT SHIFT(Rav,2) - Rbv 
S8SUBQ: Rc < LEFT SHIFT(Rav,3) - Rbv 
ENDCASE 

Exceptions: 


None 


Instruction mnemonics: 


S4SUBQ Scaled Subtract Quadword by 4 
S8SUBQ | Scaled Subtract Quadword by 8 
Qualifiers: 
None 
Description: 


Register Rb or a literal is subtracted from the scaled value of register Ra, which is scaled by 4 


MOOR, 
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4.5 Logical and Shift Instructions 


The logical instructions perform quadword Boolean operations. The conditional move integer 
instructions perform conditionals without a branch. The shift instructions perform left and 
right logical shift and right arithmetic shift. These are summarized in Table 4-6. 


Table 4-6: Logical and Shift Instructions Summary 


Mnemonic Operation 

AND Logical Product 

BIC Logical Product with Complement 
BIS Logical Sum (OR) 

EQV Logical Equivalence (XORNOT) 
ORNOT Logical Sum with Complement 
XOR Logical Difference 

CMOVxx Conditional Move Integer 

SLL | Shift Left Logical 

SRA Shift Right Arithmetic 

SRL Shift Right Logical 





Software Note: 


There is no arithmetic left shift instruction. Where an arithmetic left shift would be used, a 
logical shift will do. For multiplying by a small power of two in address computations, 
logical left shift is acceptable. 


Integer multiply should be used to perform an arithmetic left shift with overflow checking. 


Bit field extracts can be done with two logical shifts. Sign extension can be done with a left 
logical shift and a right arithmetic shift. 
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4.5.1 Logical Functions 


Format: 
mnemonic Ra.rg,Rb.rg,Rc.wq Operate format 
mnemonic Ra.rq,#b.ib,Rc.wq !Operate format 
Operation: 
Rc < Rav AND Rbv !AND 
Re < Rav OR Rbv !BIS 
Rc < Rav XOR Rbv !XOR 
Rc <~ Rav AND {NOT Rbv} !BIC 
Rc <- Rav OR {NOT Rbv} !ORNOT 
Re < Rav XOR {NOT Rbv} !EQV 
Exceptions: 
None 
Instruction mnemonics: 
AND Logical Product 
BIC Logical Product with Complement 
BIS Logical Sum (OR) 
EQV Logical Equivalence (XORNOT) 
ORNOT Logical Sum with Complement 
XOR Logical Difference 
Qualifiers: 
None 
Description: 


These instructions perform the designated Boolean function between register Ra and register 
Rb or a literal. The result is written to register Rc. 


The NOT function can be performed by doing an ORNOT with zero (Ra = R31). 
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4.5.2. Conditional Move Integer 


Format: 
CMOVxx Ra.rq,Rb.rg,Rce.wq _ !Operate format 
CMOVxx Ra.rq,#b.ib,Rc.wq {Operate format 
Operation: 


IF TEST(Rav, Condition_based_on_Opcode) THEN 


Rec < Rbv 


Exceptions: 


None 


Instruction mnemonics: 


CMOVEQ CMOVE if Register Equal to Zero 
CMOVGE CMOVE if Register Greater Than or Equal to Zero 
CMOVGT CMOVE if Register Greater Than Zero 
CMOVLBC CMOVE if Register Low Bit Clear 
CMOVLBS CMOVE if Register Low Bit Set 
CMOVLE CMOVE if Register Less Than or Equal to Zero 
CMOVLT CMOVE if Register Less Than Zero 
CMOVNE CMOVE if Register Not Equal to Zero © 
Qualifiers: 
None 
Description: 


Register Ra is tested. If the specified relationship is true, the value Rbv is written to register 
Re. 
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Notes: 


Except that it is likely in many implementations to be substantially faster, the instruction: 
CMOVEQ Ra,Rb,Rc 

is exactly equivalent to: 
BNE Ra , label 
OR Rb,Rb,Re 

label: 

For example, a branchless sequence for: 
R1=MAX(R1,R2) 

iS: 


CMPLT R1,R2,R3 | ! R3=1 if R1<R2 
CMOVNE R3,R2,R1 ! Move R2 to R1 if R1<R2 
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4.5.3 Shift Logical 


Format: 
SxL Ra.rq,Rb.rq,Rc.wq 
SxL | Ra.rq,#b.ib,Rc.wq 
Operation: 


Re <— LEFT SHIFT(Rav, Rbv<5:0>) 
Re < RIGHT SHIFT(Rav, Rbv<5:0>) 


Exceptions: 


None 


Instruction mnemonics: 


SLL Shift Left Logical 
SRL Shift Right Logical 
Qualifiers: 
None 
Description: 


!Operate format 


!Operate format 


{SLL 
!SRL 


Register Ra is shifted logically left or right 0 to 63 bits by the count in register Rb or a literal. 
The result is written to register Rc. Zero bits are propagated into the vacated bit positions. 
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4.5.4 Shift Arithmetic 


Format: 
SRA Ra.rg,Rb.rg,Rc.wq !Operate format 
SRA | Ra.rg,#b.ib,Rc.wq !Operate format 
Operation: 


Re < ARITH RIGHT SHIFT(Rav, Rbv<5:0>) 


Exceptions: 


None 


Instruction mnemonics: 


SRA Shift Right Arithmetic 


Qualifiers: 


None 


Description: 


_ Register Ra is right shifted arithmetically 0 to 63 bits by the count in register Rb or a literal. 
The result is written to register Rc. The sign bit (Rav<63>) is propagated into the vacated bit 
positions. 
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4.6 Byte Manipulation Instructions 


Alpha implementations that support the BWX extension provide the following instructions for 
loading, sign-extending, and storing bytes and words between a register and memory: 


Instruction Meaning Described in Section 
LDBU/LDWU Load byte/word unaligned 4.2.2 
SEXTB/SEXTW Sign-extend byte/word 4.6.5 
STB/STW Store byte/word 4.2.6 


The AMASK and IMPLVER instructions report whether a particular Alpha implementation 
supports the BWX extension. AMASK and IMPLVER are described in Sections 4.11.1 and 
4.11.6, respectively, and in Appendix D. 


LDBU and STB are the recommended way to perform byte load and store operations on Alpha 
implementations that support them; use them rather than the extract, insert, and mask byte 
instructions described in this section. In particular, the implementation examples in this sec- 
tion that illustrate byte operations are not appropriate for Alpha implementations that support 
the BWX extension — instead use the recommendations in Appendix A. 


In addition to LDBU and STB, Alpha provides the instructions in Table 4—7 for operating on 
byte operands within registers. 


Table 4-7: Byte-Within-Register Manipulation Instructions Summary 


Mnemonic Operation 

CMPBGE Compare Byte 

EXTBL Extract Byte Low 
EXTWL Extract Word Low 
EXTLL Extract Longword Low 
EXTQL Extract Quadword Low 
EXTWH Extract Word High 
EXTLH Extract Longword High 
EXTQH Extract Quadword High 
INSBL Insert Byte Low 
INSWL Insert Word Low 
INSLL Insert Longword Low 
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Table 4-7: Byte-Within-Register Manipulation Instructions Summary 


(Continued) 
Mnemonic Operation 
INSQL Insert Quadword Low 
INSWH Insert Word High 
INSLH Insert Longword High 
INSQH Insert Quadword High 
MSKBL Mask Byte Low 
MSKWL Mask Word Low 
MSKLL Mask Longword Low 
MSKQL Mask Quadword Low 
MSKWH Mask Word High 
MSKLH Mask Longword High 
MSKQH Mask Quadword High 
SEXTB Sign extend byte’ 
SEXTW Sign extend word 
ZAP Zero Bytes 
ZAPNOT Zero Bytes Not 
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4.6.1 Compare Byte 


Format: 
CMPBGE Ra.rq,Rb.rq,Rc.wq !Operate format 
CMPBGE Ra.rq,#b.ib,Rc.wq !Operate format 
Operation: 


FOR i FROM 0 TO 7 , 
temp<8:0> < 0 || Rav<i*8+7:i*8>} + {0 || NOT Rbv<i*8+7:i*8>} + 1 
Re<i> < temp<8> 

END 

Rc<63:8> < 0 


Exceptions: 


None 


Instruction mnemonics: 


CMPBGE Compare Byte 


Qualifiers: 


None 


Description: 


CMPBGE does eight parallel unsigned byte comparisons between corresponding bytes of Rav 
and Rbv, storing the eight results in the low eight bits of Rc. The high 56 bits of Rc are set to 
zero. Bit 0 of Rc corresponds to byte 0, bit 1 of Rc corresponds to byte 1, and so forth. A 
result bit is set in Rc if the corresponding byte of Rav is greater than or equal to Rbv 
(unsigned). 


Notes: 
The result of CMPBGE can be used as an input to ZAP and ZAPNOT. 


To scan for a byte of zeros in a character string: 


<initialize Rl to aligned QW address of string> 
LOOP: 
LDQ R2, 0O(R1) 
LDA Rl, 8(R1) 
CMPBGE R31, R2,R3 
BEQ R3, LOOP 


Pick up 8 bytes 

Increment string pointer 

If NO bytes of zero, R3<7:0>=0 
Loop if no terminator byte found 
At this point, R3 can be used to 
determine which byte terminated 


=e =e =e =e =e =e 


DIGITAL Restricted Distribution 


Instruction Descriptions (I) 4-49 


To compare two character strings for greater/equal/less: 


-<initialize Rl to aligned QW address of stringl> 
<initialize R2 to aligned QW address of string2> 


LOOP: 
LDQ R3, 0O(R1) ; Pick up 8 bytes of stringl 
LDA R1, 8(R1) ; Increment stringl pointer 
LDQ R4, 0(R2) ; Pick up 8 bytes of string2 
LDA R2, 8(R2) ; Increment string2 pointer 
CMPBGE R31, R3, R6 ; Test for zeros in stringl 
XOR R3, R4, R5 ; Test for all equal bytes 
BNE R6, DONE ; Exit if a zero found 
BEQ R5, LOOP ; Loop if all equal 

DONE: CMPBGE R31, R5, R5 : 


; At this point, R5 can be used to determine the first not-equal 
; byte position (if any), and R6 can be used to determine the 
; position of the terminating zero in stringl (if any). 


To range-check a string of characters in R1 for ‘0’...‘9’: 


Pick up 8 bytes of the character 
BELOW ‘0’ ‘////////' 

Pick up 8 bytes of the character 
ABOVE: 229%: “esse ss sie" 

Some R4<i>=1 if character is LT ‘0’ 
Some R5<i>=1 if character is GT ‘9’ 
Branch if some char too low 

Branch if some char too high 


LDQ ~=—S_- R2,:1itOs 
LDQ ~=E._-_ R3,_-«1it9s 


CMPBGE R2, R1, R4 
CMPBGE R1, R3, R5 
BNE  R4, ERROR 
BNE _R5, ERROR 


a i i i i eT) 
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4.6.2 Extract Byte 


Format: 
EXTxx Ra.rq,Rb.rq,Rc.wq 'Operate format 
EXTxx Ra.rg,#b.ib,Rc.wq !Operate format 
Operation: 
CASE 


big endian data: Rbv’ < Rbv XOR 111, 


little _endian_data: Rbv’ < Rbv 
ENDCASE 


CASE 
EXTBL: byte mask < 0000 0001, 
EXTWx: byte mask < 0000 0011, 
EXTLx: byte mask <- 0000 1111, 
EXTQx: byte mask <- 1111 1111, 
ENDCASE 


CASE 
EXTXL: 
byte loc <- Rbv’<2:0>*8 
temp <- RIGHT _SHIFT(Rav, byte _loc<5:0>) 
Re < BYTE ZAP(temp, NOT(byte mask) ) 
EXTXH: 
byte loc < 64 - Rbv’<2:0>*8 
temp <- LEFT SHIFT(Rav, byte _loc<5:0>) 
Re < BYTE ZAP(temp, NOT(byte mask) ) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


EXTBL Extract Byte Low 
EXTWL Extract Word Low 
EXTLL Extract Longword Low 
EXTQL Extract Quadword Low 
EXTWH Extract Word High 
EXTLH Extract Longword High 
EXTQH Extract Quadword High 
Qualifiers: 
None 
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Description: 


EXTXL shifts register Ra right by 0 to 7 bytes, inserts zeros into vacated bit positions, and 
then extracts 1, 2, 4, or 8 bytes into register Rc. EXTXH shifts register Ra left by 0 to 7 bytes, 
inserts zeros into vacated bit positions, and then extracts 2, 4, or 8 bytes into register Rc. The 
number of bytes to shift is specified by Rbv’ <2:0>. The number of bytes to extract is speci- 
fied in the function code. Remaining bytes are filled with zeros. 


Notes: 


The comments in the examples below assume that the effective address (ea) of X(R11) is such 
that (ea mod 8) = 5), the value of the aligned quadword containing X(R11) is CBAx xxxx, and 
the value of the aligned quadword containing X+7(R11) is yyyH GFED, and the datum is 
little-endian. 


The examples below are the most general case unless otherwise noted; if more information is 
known about the value or intended alignment of X, shorter sequences can be used. 


The intended sequence for loading a quadword from unaligned address X(R11) is: 


LDQ U Ri, X(R11) ; Ignores va<2:0>, Rl = CBAx xXxxx 
LDQ U R2, X+7(RI11) ; Ignores va<2:0>, R2 = yyyH GFED 
LDA R3, X(R11) ; R3<2:0> = (X mod 8) = 5 

EXTQL R1, R3, Rl ; Rl = 0000 OCBA 

EXTQH R2, R3, R2 ; R2 = HGFE DOO0 

OR R2, R1, R1 ; Rl = HGFE DCBA 


The intended sequence for loading and zero-extending a longword from unaligned address X 
iS: 


LDQ U Ri, X(R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
LDQ U_ R2, X+3(R11) ; Ignores va<2:0>, R2 YYVYY yyyD 
LDA R3, X(R11) ; R3<2:0> = (X mod 8) = 
7 


| 
ul 


EXTLL R1, R3, Rl 1 = 0000 OCBA 
EXTLH R2, R3, R2 2 
OR R2, Rl, Rl > R1 


wa a 


0000 D000 
0000 DCBA 


The intended sequence for loading and sign-extending a longword from unaligned address X 
is: 


LDQ U Ril, X(R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
LDQ .U R2, X+3(R11) ; Ignores va<2:0>, R2 = yyyy yyyD 
LDA R3, X(R11) ; R3<2:0> = (X mod 8) = 5 

EXTLL R11, R3, R1 + Rl = 0000 OCBA 

EXTLH R2, R3, R2 ; R2 = 0000 DOOO 

OR R2, Rl, R1 ; Rl = 0000 DCBA 

ADDL R31, Rl, Rl. * Rl = ssss DCBA 
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For software that is not designed to use the BWX extension, the intended sequence for loading 
and zero-extending a word from unaligned address X is: 


LDQ U RI, X(R11) 
LDQ U R2, X+1(R11) 
LDA ~—R33,-:-X(R11) 
EXTWL Rl, R3, Rl 
EXTWH R2, R3, R2 
OR R2, Rl, Rl 


=e me =e =e me =e 


Ignores va<2:0>, Rl = yBAx xxxx 
Ignores va<2:0>, R2 = yBAX xxxx 
R3<2:0> = (X mod 8) = 5 


R1 = 0000 OOBA 
R2 = 0000 0000 
R1 = 0000 OOBA 


For software that is not designed to use the BWX extension, the intended sequence for loading 
and sign-extending a word from unaligned address X is: 


LDQ U R1, X(R11) 
LDQ U R2, X+1(R11) 
LDA =R3, + X+1+1(R11) 
EXTOL R1, R3, Rl 
EXTQH R2, R3, R2 

OR R2, R1, RI 
SRA RI, #48, R1 


a i i i iT | 


Ignores va<2:0>, Rl = 
Ignores va<2:0>, R2 
R3<2:0> = 5+1+1 = 7 
R1 = 0000 000y 
R2 = BAxx xxx0 
Rl BAXX XXXY 
Rl ssss SSBA 


yBAX XXXX 
YBAX XXXX 


For software that is not designed to use the BWX extension, the intended sequence for loading 
and zero-extending a byte from address X is: 


LDQ U RI, X(R11) 
LDA ~— R33, -X(R11) 
EXTBL R1, R3, Rl 


. 
f 
e 
tA 
e 
T 


Ignores va<2:0>, Rl = yyAx xxxx 
R3<2:0> = (X mod 8) = 5 
R1 = 0000 OO0A 


For software that is not designed to use the BWX extension, the intended sequence for loading 
and sign-extending a byte from address X is: 


LDQ U Ri, 
LDA R3, 


X(R11) 
X+1(R11) 


EXTOH R1, R3, Rl 


SRA Rl, #56, R1 


Optimized examples: 


=e =e =e =e =e =e =e =e =e =e 


Ignores va<2:0>, Rl = yyAx xxxx 
R3<2:0> = (X + 1) mod 8, i.e., 
convert byte position within 
quadword to one-origin based 
Places the desired byte into byte 7 
of Rl.final by left shifting 
Rl.initial by ( 8 - R3<2:0> ) byte 
positions 

Arithmetic Shift of byte 7 down 
into byte 0, 


Assume that a word fetch is needed from 10(R3), where R3 is intended to contain a long- 
word-aligned address. The optimized sequences below take advantage of the known constant 
offset, and the longword alignment (hence a single aligned longword contains the entire 
word). The sequences generate a Data Alignment Fault if R3 does not contain a long- 


word-aligned address. 
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For software that is not designed to use the BWX extension, the intended sequence for loading 
and zero-extending an aligned word from 10(R3) is: 


LDL  R1, 8(R3) 


EXTWL Ril, #2, R1 


° 
tf 
° 
a 
e 
7 


Rl = ssss BAxx 
Faults if R3 is not longword aligned 
R1 = 0000 OOBA 


For software that is not designed to use the BWX extension, the intended sequence for loading 
and sign-extending an aligned word from 10(R3) is: 


LDL R1, 8(R3) 


SRA RI, #16, R1 


Big-endian examples: 


=e 068 6 


Rl = ssss BAxx 
Faults if R3 is not. longword aligned 
Rl = ssss ssBA 


For software that is not designed to use the BWX extension, the intended sequence for loading 
and zero-extending a byte from address X is: 


LDQ U R1, X(R11) 
LDA  ~——R3, +X(R11) 
EXTBL R1, R3, R1 


° 
v 
e 
T 


e 
1 


Ignores va<2:0>, Rl = xxxx xAyy 
R3<2:0> = 5, shift will be 2 bytes 
R1 = 0000 000A 


The intended sequence for loading a quadword from unaligned address X(R11) is: 


LDQ U RI, X(R11) 
LDQ .U R2, X+7(R11) 
LDA R3, X+7(R11) 
EXTOH R1, R3, R1 
EXTOL R2, R3, R2 
OR Rl, R2, Rl 


tL 


Ignores va<2:0>, Rl = xxxxxABC 
Ignores va<2:0>, R2 = DEFGHyyy 
R3<2:0> = 4, shift will be 3 bytes 
R1 = ABCO 0000 

R2 000D EFGH 

Rl ABCD EFGH 


Note that the address in the LDA instruction for big-endian quadwords is X+7, for longwords 
is X+3, and for words is X+1; for little-endian, these are all just X. Also note that the EXTQH 
and EXTQL instructions are reversed with respect to the little-endian sequence. 
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4.6.3 Byte Insert 


Format: 
INSxx Ra.rq,Rb.rg,Re.wq 
INSxx Ra.rg,#b.ib,Rc.wq 
Operation: 
CASE 


big endian data: Rbv’ < Rbv XOR 111, 
_ little endian data: Rbv’ <- Rbv 
ENDCASE 


CASE 
INSBL: byte _ mask <- 0000 0000 0000 0001, 


INSWx: byte mask <- 0000 0000 0000 0011, 
INSLx: byte mask < 0000 0000 0000 1111, 
INSQx: byte mask <- 0000 0000 1111 1111, 


ENDCASE 
byte mask <- LEFT SHIFT(byte mask, Rbv’<2:0>) 


CASE 
INSXL: 
byte loc <— Rbv’<2:0>*8 
temp <- LEFT SHIFT(Rav, byte _loc<5:0>) 
Re < BYTE _ZAP(temp, NOT(byte_mask<7:0>) ) 
INSXH: 
byte loc < 64 — Rbv’<2:0>*8 
temp <- RIGHT SHIFT(Rav, byte_loc<5:0>) 
Re < BYTE ZAP(temp, NOT(byte mask<15:8>) ) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


INSBL Insert Byte Low 
INSWL Insert Word Low 
INSLL Insert Longword Low 
INSQL Insert Quadword Low 
INSWH Insert Word High 
INSLH Insert Longword High 
INSQH Insert Quadword High 
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{Operate format 
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Qualifiers: 


None 


Description: 


INSxL and INSxH shift bytes from register Ra and insert them into a field of zeros, storing the 
result in register Rc. Register Rbv ’ <2:0> selects the shift amount, and the function code 
selects the maximum field width: 1, 2, 4, or 8 bytes. The instructions can generate a byte, 
word, longword, or quadword datum that is spread across two registers at an arbitrary byte 
alignment. 
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4.6.4 Byte Mask 


Format: 
MSKxx Ra.rq,Rb.rq,Rc.wq !Operate format 
MSKxx Ra.rq,#b.ib,Rc.wq !Operate format 
Operation: 
CASE 


big endian data: Rbv’<- Rbv XOR 111, 


little endian data: Rbv’<— Rbv 
ENDCASE 


CASE 
MSKBL: byte mask <- 0000 0000 0000 0001, 
MSKWx: byte mask < 0000 0000 0000 0011, 
MSKLx: byte mask <- 0000 0000 0000 1111, 
MSKQx: byte mask < 0000 0000 1111 1111, 


ENDCASE 
byte mask <- LEFT SHIFT(byte mask, Rbv’<2:0>) 


CASE 
MSKxL: 
Rc <- BYTE ZAP(Rav, byte mask<7:0>) 
MSKxH : 
Rc <- BYTE ZAP(Rav, byte mask<15:8>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


MSKBL Mask Byte Low 
MSKWL Mask Word Low 
MSKLL Mask Longword Low 
MSKQL Mask Quadword Low 
MSKWH Mask Word High 
MSKLH Mask Longword High 
MSKQH Mask Quadword High 
Qualifiers: 
None 
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Description: 


MSKxL and MSKxXH set selected bytes of register Ra to zero, storing the result in register Rc. 
Register Rbv ‘<2:0> selects the starting position of the field of zero bytes, and the function 
code selects the maximum width: 1, 2, 4, or 8 bytes. The instructions generate a byte, word, 
longword, or quadword field of zeros that can spread across two registers at an arbitrary byte 
alignment. 


Notes: 


The comments in the examples below assume that the effective address (ea) of X(R11) is such 
that (ea mod 8) = 5, the value of the aligned quadword containing X(R11) is CBAx xxxx, the 
value of the aligned quadword containing X+7(R11) is yyyH GFED, the value to be stored 
from R5 is HGFE DCBA, and the datum is little-endian. Slight modifications similar to those 
in Section 4.6.2 apply to big-endian data. 


The examples below are the most general case; if more information is known about the value 
or intended alignment of X, shorter sequences can be used. 


The intended sequence for storing an unaligned quadword R5 at address X(R11) is: 


LDA = R6, X(R11) 
LDQ U R2, X+7(R11) 
LDQ U R1, X(R11) 
INSQH R5, R6, R4 
INSQL R5, R6, R3 
MSKOH R2, R6, R2 
MSKQL R1, R6, R1 
OR R2, R4, R2 
OR R1, R3, R1 
STQ U R2, X+7(R11) 
STQ U Rl, X(R11) 


R6<2:0> = (X mod 8) = 5 

Ignores va<2:0>, R2 = yyyH GFED 
Ignores va<2:0>, Rl CBAX XXXx 
R4 = 0Q00H GFED 

R3 = CBAO 0000 

R2 = yyy0 0000 

Rl = 000x xxxx 

R2 = yyyH GFED 

Rl = CBAX xxxx 

Must store high then low for 
degenerate case of aligned QW 


me =e =e =e =e ~e =e =e =e =e =e 


The intended sequence for storing an unaligned longword RS at X is: 


LDA = R6, -X(R11) 
LDQ U R2, X+3(R11) 
LDQ U R1, X(R11) 
INSLH R5, R6, R4 
INSLL R5, R6, R3 
MSKLH R2, R6, R2 
MSKLL R1, R6, R1 
OR R2, R4, R2 
OR Rl, R3, Rl 
STQ U R2, X+3(R11) 
STQ U R1, X(R11) 


R6<2:0> = (X mod 8) = 5 

Ignores va<2:0>, R2 YYVY yyyD 
Ignores va<2:0>, Rl CBAX XXxKX 
R4 = 0000 000D | 

R3 = CBAO 0000 

R2 = yyyy yyy0 

Rl = 000x xxxx 

R2 = yyyy yyyD 

Rl = CBAX xxxx 

Must store high then low for 
degenerate case of aligned 


a 
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For software that is not designed to use the BWX extension, the intended sequence for storing 


an unaligned word RS at X is: 


LDA 
LDQ U 
LDQ_U 
INSWH 
INSWL 
MSKWH R2, 
MSKWL R1, 
OR R2, 
OR R1, 
STQ U R2, 
STQ U RI, 


R6, 
R2, 
Rl, 
R5, 
R5, 


X(R11) 
X+1(R11) 
X(R11) 
R6, R4 
R6, R3 
R6, R2 
R6, Rl 
R4, R2 
R3, Rl 
X+1(R11) 
X(R11) 


=e =o =e =e =e =e =e =e =e =e =e 


R6<2:0> = (X mod 8) = 5 

Ignores va<2:0>, R2 = yBAx xxxx 
Ignores va<2:0>, Rl = yBAx xxxx 
R4 = 0000 0000 

R3 = OBAO 0000 

R2 = yBAX xxxx 

Rl = yOOx xxxx 


R2 = yBAX xXxXxx 

Rl = yBAx xxxx 

Must store high then low for 
degenerate case of aligned 


For software that is not designed to use the BWX extension, the intended sequence for storing 


a byte R5 at X is: 


LDA 
LDQ _U 
INSBL R5, 
MSKBL R1, 
OR RI, 
STO U RI, 


RI, 


X(R11) 
X(R11) 
R6, R3 
R6, R1 
R3, R1 
X(R11) 
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R6<2:0> = (X mod 8) = 5 

Ignores va<2:0>, Rl yyAX XXXX 
R3 = O0A0 0000 

Rl = yyOx xxxx 

Rl = yyAx xxxx 
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4.6.5 Sign Extend 


Format: 
SEXTx Rb.rq,Rce.wq !Operate format 
SEXTx #b.ib,Rc.wq !Operate format 
Operation: 
CASE 


SEXTB: Rc <- SEXT(Rbv<07:0>) 
SEXTW: Rc < SEXT(Rbv<15:0>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


SEXTB Sign Extend Byte 
SEXTW Sign Extend Word 
Qualifiers: 
None 
Description: 


The byte or word in register Rb is sign-extended to 64 bits and written to register Rc. Ra must 
be R31. 


Implementation Note: 


The SEXTB and SEXTW instructions are supported in hardware on Alpha 
implementations for which the AMASK instruction returns bit 0 set. SEXTB and SEXTW 
are supported with software emulation in Alpha implementations for which AMASK does 
not return bit 0 set. Software emulation of SEXTB and SEXTW is significantly slower 
than hardware support. 
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4.6.6 Zero Bytes 


Format: 


ZAPx Ra.rq,Rb.rq,Rc.wq !Operate format 


ZAPx Ra.rg,#b.ib,Rc.wq !Operate format 


Operation: 


CASE 
ZAP: 
Re < BYTE ZAP(Rav, Rbv<7:0>) 


ZAPNOT : 


Re <- BYTE ZAP(Rav, NOT Rbv<7:0>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


ZAP Zero Bytes 
ZAPNOT Zero Bytes Not 
Qualifiers: 
None 
Description: 


ZAP and ZAPNOT set selected bytes of register Ra to zero and store the result in register Rc. 
Register Rb<7:0> selects the bytes to be zeroed. Bit 0 of Rbv corresponds to byte 0, bit 1 of 
Rbvy corresponds to byte 1, and so on. A result byte is set to zero if the corresponding bit of | 
Rbv is a one for ZAP and a zero for ZAPNOT. 
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4.7 Floating-Point Instructions 


Alpha provides instructions for operating on floating-point operands in each of four data 
formats: 


e =F _floating (VAX single) 
e §€G_floating (VAX double, 11-bit exponent) 
e §=©S_ floating (IEEE single) 
e T_floating (EEE double, 11-bit exponent) 


Data conversion instructions are also provided to convert operands between floating-point and 
quadword integer formats, between double and single floating, and between quadword and 
longword integers. 


Note: 


D_floating is a partially supported datatype; no D_floating arithmetic operations are 
provided in the architecture. For backward compatibility, exact D_floating arithmetic may 
be provided via software emulation. D_floating "format compatibility," in which binary 
files of D_floating numbers may be processed but without the last 3 bits of fraction 
precision, can be obtained via conversions to G_floating, G arithmetic operations, then 
conversion back to D_floating. 


The choice of data formats is encoded in each instruction. Each instruction also encodes the 
choice of rounding mode and the choice of trapping mode. 


All floating-point operate instructions (not including loads or stores) that yield an F_floating 
or G_floating zero result must materialize a true zero. 


4.7.1 Single-Precision Operations 


Single-precision values (F_floating or S_floating) are stored in the floating-point registers in 
canonical form, as subsets of double-precision values, with 11-bit exponents restricted to the 
corresponding single-precision range, and with the 29 low-order fraction bits restricted to be 
all zero. 


Single-precision operations applied to canonical single-precision values give single-precision 
results. Single-precision operations applied to non-canonical operands give UNPREDICT- 
ABLE results. 


Longword integer values in floating-point registers are stored in bits <63:62,58:29>, with bits 
<61:59> ignored and zeros in bits <28:0>. 


4.7.2 Subsets and Faults 


Ali floating-point operations may take floating disabled faults. Any subsetted floating-point 
instruction may take an Illegal Instruction Trap. These faults are not explicitly listed in the 
description of each instruction. 
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All floating-point loads and stores may take memory management faults (access control viola- 
tion, translation not valid, fault on read/write, data alignment). 


The floating-point enable (FEN) internal processor register (IPR) allows system software to 
restrict access to the floating-point registers. 


If a floating-point instruction is implemented and FEN = 0, attempts to execute the instruction 
cause a floating disabled fault. 


If a floating-point instruction is not implemented, attempts to execute the instruction cause an 
Illegal Instruction Trap. This rule holds regardless of the value of FEN. 


An Alpha implementation may provide both VAX and IEEE floating-point operations, either, 
or none. 


Some floating-point instructions are common to the VAX and IEEE subsets, some are VAX 
only, and some are IEEE only. These are designated in the descriptions that follow. If either 
subset is implemented, all the common instructions must be implemented. 


An implementation that includes [EEE floating-point may subset the ability to perform round- 
ing to plus infinity and minus infinity. If not implemented, instructions requesting these 
rounding modes take Illegal Instruction Trap. 


An implementation that includes IEEE floating-point may implement any subset of the Trap 
Disable flags (DNOD, DZED, INED, INVD, OVFD, and UNEFD) and Denormal Control flags 
(DNZ and UNDZ) in the FPCR: 


e Ifa Trap Disable flag is not implemented, then the corresponding trap occurs as usual. 


e If DNZ is not implemented, then any IEEE operation with a denormal input must take 
an Invalid Operation Trap. 


© ~=If UNDZ is not implemented, then any IEEE operation that includes a /S qualifier that 
underflows must take an Underflow Trap. 


¢ If DZED is implemented, then IEEE division of 0/0 must be treated as an invalid opera- 
tion instead of a division by zero. 


Any unimplemented bits in the FPCR are read as zero and ignored when set. 


4.7.3 Definitions 


The following definitions apply to Alpha floating-point support. 


Alpha finite number 


A floating-point number with a definite, in-range value. Specifically, all numbers in the inclu- 
sive ranges -MAX through —MIN, zero, and +MIN through +MAX, where MAX is the largest 
non-infinite representable floating-point number and MIN is the smallest non-zero represent- 
able normalized floating-point number. 


DIGITAL Restricted Distribution 


Instruction Descriptions (I) 4-63 


For VAX floating-point, finites do not include reserved operands or dirty zeros (this differs 
from the usual VAX interpretation of dirty zeros as finite). For IEEE floating-point, finites do 
not include infinites, NaNs, or denormals, but do include minus zero. 


denormal 


An IEEE floating-point bit pattern that represents a number whose magnitude lies between 
zero and the smallest finite number. 


dirty zero 


A VAX floating-point bit pattern that represents a zero value, but not in true-zero form. 


infinity 


An IEEE floating-point bit pattern that represents plus or minus infinity. 


LSB 


The least significant bit. For a positive finite representable number A, A + 1 LSB is the next 
larger representative number, and A + % LSB is exactly halfway between A and the next 
larger representable number. For a positive representable number A whose fraction field is not 
all zeros, A — 1 LSB is the next smaller representable number, and A — % LSB is exactly half- 
way between A and the next smaller representable number. 


non-finite number 


An IEEE infinity, NaN, denormal number, or a VAX dirty zero or reserved operand. 


Not-a-Number 


An IEEE floating-point bit pattern that represents something other than a number. This comes 
in two forms: signaling NaNs (for Alpha, those with an initial fraction bit of 0) and quiet 
NaNs (for Alpha , those with initial fraction bit of 1). | 


representable result 


A real number that can be represented exactly as a VAX or IEEE floating-point number, with 
finite precision and bounded exponent range. 


reserved operand 


A VAX floating-point bit pattern that represents an illegal value. 


trap shadow 


The set of instructions potentially executed after an instruction that signals an arithmetic trap 
but before the trap is actually taken. 


true result 


The mathematically correct result of an operation, assuming that the input operand values are 
exact. The true result is typically rounded to the nearest representable result. 
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true zero 


The value +0, represented as exactly 64 zeros in a floating-point register. 


4.7.4 Encodings 


Floating-point numbers are represented with three fields: sign, exponent, and fraction. The 
sign is 1 bit; the exponent is 8, 11, or 15 bits; and the fraction is 23, 52, 55, or 112 bits. Some 
encodings represent special values: . 


Sign Exponent Fraction Mos : va re me 
Meaning Finite Meaning Finite 

X All-1’s Non-zero ___ Finite Yes +/—NaN No 

x All-1’s 0 Finite Yes +/—Infinity No 

0 0 Non-zero __ Dirty zero No +Denormal No 

1 0 Non-zero  Resv. operand No —Denormal No 

0 0 0 True zero Yes +0 Yes 

1 0 0 Resv. operand No —0 Yes 

x Other X Finite Yes finite Yes 


The values of MIN and MAX for each of the five floating-point data formats are: 


Data 


MIN . MAX 

Format 

F_floating 2**-127* 0.5 2**127 *(1.0 — 2**-24) 
(0.293873588e—38) (1.7014117e38) 

G_floating 2**-1023 * 0.5 2** 1023 * (1.0 — 2**-53) 
(0.5562684646268004e-308) (0.89884656743115785407e308) 

S_floating 2**-126 * 1.0 2**127 * (2.0 — 2**—23) 
(1.17549435e-38) (3.40282347e38) 

T_floating 2**-1022 * 1.0 2**1023 * (2.0 — 2**-52) 
(2.2250738585072013e-308) (1.797693 1348623158e308) 

X_floating 2**-16382*1.0 2**16383*(2.0-2**-112) 
(See below’) (See below*) 


¥  (1.18973149535723176508575932662800702e4932) 
¥ (3,36210314311209350626267781732175260e-4932) 
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4.7.5 Rounding Modes 


All rounding modes map a true result that is exactly representable to that representable value. 


VAX Rounding Modes 


For VAX floating-point operations, two rounding modes are provided and are specified in 
each instruction: normal (biased) rounding and chopped rounding. 


Normal VAX rounding maps the true result to the nearest of two representable results, with 
true results exactly halfway between mapped to the larger in absolute value (sometimes called 
biased rounding away from zero); maps true results > MAX + 1/2 LSB in magnitude to an 
overflow; maps true results < MIN — 1/4 LSB in magnitude to an underflow. 


Chopped VAX rounding maps the true result to the smaller in magnitude of two surrounding 
representable results; maps true results > MAX + 1 LSB in magnitude to an overflow; maps 
true results < MIN in magnitude to an underflow. 


IEEE Rounding Modes 


For IEEE floating-point operations, four rounding modes are provided: normal rounding (unbi- 
ased round to nearest), rounding toward minus infinity, round toward zero, and rounding 
toward plus infinity. The first three can be specified in the instruction. Rounding toward plus 
infinity can be obtained by setting the Floating-point Control Register (FPCR) to select it and 
then specifying dynamic rounding mode in the instruction (See Section 4.7.8). Alpha IEEE 
arithmetic does rounding before detecting overflow/underflow. 


Normal IEEE rounding maps the true result to the nearest of two representable results, with 
true results exactly halfway between mapped to the one whose fraction ends in 0 (sometimes 
called unbiased rounding to even); maps true results > MAX + 1/2 LSB in magnitude to an 
overflow; maps true results < MIN — 1/2 LSB in magnitude to an underflow. 


Plus infinity IEEE rounding maps the true result to the larger of two surrounding 


results; maps true results > MAX in magnitude to an overflow; maps positive true results 
<+MIN — 1 LSB to an underflow; and maps negative true results > -MIN to an underflow. 


‘Minus infinity IEEE rounding maps the true result to the smaller of two surrounding represent- 
able results; maps true results > MAX in magnitude to an overflow; maps positive true results 
< +MIN to an underflow; and maps negative true results > -MIN + 1 LSB to an underflow. 


Chopped IEEE rounding maps the true result to the smaller in magnitude of two surrounding 
representable results; maps true results > MAX + 1 LSB in magnitude to an overflow; and 
maps non-zero true results < MIN in magnitude to an underflow. 


Dynamic rounding mode uses the [IEEE rounding mode selected by the FPCR register and is 
described in more detail in Section 4.7.8. 
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The following tables summarize the floating-point rounding modes: 


VAX Rounding Mode Instruction Notation 

Normal rounding (No qualifier) 

Chopped IC 

IEEE Rounding Mode Instruction Notation 

Normal rounding (No qualifier) 

Dynamic rounding /D 

Plus infinity _ /D and ensure that FPCR<DYN> = ‘11’ 
Minus infinity /M 

Chopped IC 


4.7.6 Computational Models 


The Alpha architecture provides a choice of floating-point computational models. 
There are two computational models available on systems that implement the VAX float- 
ing-point subset: 

¢ VAX-format arithmetic with precise exceptions 

e High-performance VAX-format arithmetic 


There are three computational models available on systems that implement the IEEE float- 
ing-point subset: 


e JEEE compliant arithmetic 
e JEEE compliant arithmetic without inexact exception 


e High-performance IEEE-format arithmetic 


4.7.6.1 VAX-Format Arithmetic with Precise Exceptions 


This model provides floating-point arithmetic that is fully compatible with the floating-point 
arithmetic provided by the VAX architecture. It provides support for VAX non-finites and 
gives precise exceptions. 


This model is implemented by using VAX floating-point instructions with the /S, /SU, and 
/SV trap qualifiers. Each instruction can determine whether it also takes an exception on under- 
flow or integer overflow. The performance of this model depends on how often computations 
involve non-finite operands. Performance also depends on how an Alpha system chooses to 
trade off implementation complexity between hardware and operating system completion han- 
dlers (see Section 4.7.7.3). 
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4.7.6.2 High-Performance VAX-Format Arithmetic 


This model provides arithmetic operations on VAX finite numbers. An imprecise arithmetic 
trap is generated by any operation that involves non-finite numbers, floating overflow, and 
divide-by-zero exceptions. . 


This model is implemented by using VAX floating-point instructions with a trap qualifier 
other than /S, /SU, or /SV. Each instruction can determine whether it also traps on underflow 
or integer overflow. This model does not require the overhead of an operating system comple- 
tion handler and can be the faster of the two VAX models. 


4.7.6.3 TEEE-Compliant Arithmetic 


This model provides floating-point arithmetic that fully complies with the IEEE Standard for 
Binary Floating-Point Arithmetic. It provides all of the exception status flags that are in the 
standard. It provides a default where all traps and faults are disabled and where IEEE 
non-finite values are used in lieu of exceptions. 


Alpha operating systems provide additional mechanisms that allow the user to specify dynami- 
cally which exception conditions should trap and which should proceed without trapping. The 
operating systems also include mechanisms that allow alternative handling of denormal val- 
ues. See Appendix B and the appropriate operating system documentation for a description of 
these mechanisms. 


This model is implemented by using IEEE floating-point instructions with the /SUI 
or/SVI trap qualifiers. The performance of this model depends on how often computations 
involve inexact results and non-finite operands and results. Performance also depends on how 
the Alpha system chooses to trade off implementation complexity between hardware and oper- 
ating system completion handlers (see Section 4.7.7.3). This model provides acceptable 
performance on Alpha systems that implement the inexact disable (INED) bit in the FPCR. 
Performance may be slow if the INED bit is not implemented. . 


4.7.6.4 TEEE-Compliant Arithmetic Without Inexact Exception 


This model is similar to the model in Section 4.7.6.3, except this model does not signal inexact 
results either by the inexact status flag or by trapping. Combining routines that are compiled 
with this model and routines that are compiled with the model in Section 4.7.6.3 can give an 
application better control over testing when an inexact operation will affect computational 
accuracy. 


This model is implemented by using IEEE floating-point instructions with the /SU or /SV trap 
qualifiers. The performance of this model depends on how often computations involve 
non-finite operands and results. Performance also depends on how an Alpha system chooses to 
trade off implementation complexity between hardware and operating system completion han- 
dlers (see Section 4.7.7.3). 
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4.7.6.5 High-Performance [KEE-Format Arithmetic 


This model provides arithmetic operations on IEEE finite numbers and notifies applications of 
all exceptional floating-point operations. An imprecise arithmetic trap is generated by any 
operation that involves non-finite numbers, floating overflow, divide-by-zero, and invalid 
operations. Underflow results are set to zero. Conversion to integer results that overflow are 
set to the low-order bits of the integer value. 


This model is implemented by using IEEE floating-point instructions with a trap qualifier 
other than /SU, /SV, /SUI, or /SVI. Each instruction can determine whether it also traps on 
underflow or integer overflow. This model does not require the overhead of an operating sys- 
tem completion handler and can be the fastest of the three IEEE models. 


4.7.7 Trapping Modes 


There are six exceptions that can be generated by floating-point operate instructions, all sig- 
naled by an arithmetic exception trap. These exceptions are: 


e Invalid operation 
* Division by zero 
¢ §6Overflow 

¢ Underflow 

e =Inexact result 


e Integer overflow (conversion to integer only) 


4.7.7.1 VAX Trapping Modes 


This section describes the characteristics of the four VAX trapping modes, which are summa- 
rized in Table 4-8. 
When no trap mode is specified (the default): 
e Arithmetic is performed on VAX finite numbers. 
¢ Operations give imprecise traps whenever the following occur: 
— an operand is a non-finite number 
— a floating overflow 
— adivide-by-zero 
¢ Traps are imprecise and it is not always possible to determine which instruction trig- 
gered a trap or the operands of that instruction. 
e An underflow produces a zero result without trapping. 
e A conversion to integer that overflows uses the low-order bits of the integer as the 
result without trapping. 
e The result of any operation that traps is UNPREDICTABLE. 
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When /U or /V mode is specified: 


Arithmetic is performed on VAX finite numbers. 

Operations give imprecise traps whenever the following occur: 

— an operand is a non-finite number 

— anunderflow 

— an integer overflow 

— a floating overflow 

— adivide-by-zero 

Traps are imprecise and it is not always possible to determine which instruction trig- 
gered a trap or the operands of that instruction. 

An underflow trap produces a zero result. 

A conversion to integer trapping with an integer overflow produces the low-order bits 
of the integer value. 

The result of any other operation that traps is UNPREDICTABLE. 


When /S mode is specified: 


Arithmetic is performed on all VAX values, both finite and non-finite. 

A VAX dirty zero is treated as zero. 

Exceptions are signaled for: 

— a VAX reserved operand, which generates an invalid operation exception 

— a floating overflow 

— adivide-by-zero 

Exceptions are precise and an application can locate the instruction that caused the 
exception, along with its operand values. See Section 4.7.7.3. 

An operation that underflows produces a zero result without taking an exception. 

A conversion to integer that overflows uses the low-order bits of the integer as the 
result, without taking an exception. 

When an operation takes an exception, the result of the operation is UNPREDICT- 
ABLE. 


When /SU or /SV mode is specified: 


Arithmetic is performed on aii VAX vaiues, both finite and non-finite. 

A VAX dirty zero is treated as zero. 

Exceptions are signaled for: 

~ a VAX reserved operand, which generates an invalid operation exception 

— anunderflow 

— an integer overflow 

— a floating overflow 

— adivide-by-zero 

Exceptions are precise and an application can locate the instruction that caused the 
exception, along with its operand values. See Section 4.7.7.3. 

An underflow exception produces a zero. 

A conversion to integer exception with integer overflow produces the low-order bits of 
the integer value. 

The result of any other operation that takes an exception is UNPREDICTABLE. 
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A summary of the VAX trapping modes, instruction notation, and their meaning follows in 
Table 4-8: 


Table 4-8: VAX Trapping Modes Summary 


Trap Mode Notation Meaning 
Underflow disabled No qualifier Imprecise 

IS Precise exception completion 
Underflow enabled /U Imprecise 

/SU Precise exception completion 


Integer overflow disabled No qualifier Imprecise 


/S Precise exception completion 
Integer overflow enabled /V Imprecise 
ISV Precise exception completion 


4.7.7.2 TEEE Trapping Modes 


This section describes the characteristics of the four IEEE trapping modes, which are summa- 
rized in Table 4-9. 


When no trap mode is specified (the default): 


Arithmetic is performed on IEEE finite numbers. 

Operations give imprecise traps whenever the following occur: 

— an operand is a non-finite number 

— a floating overflow 

— adivide-by-zero 

— an invalid operation 

Traps are imprecise, and it is not always possible to determine which instruction trig- 
gered a trap or the operands of that instruction. 

An underflow produces a zero result without trapping. 

A conversion to integer that overflows uses the low-order bits of the integer as the 
result without trapping. 

When an operation traps, the result of the operation is UNPREDICTABLE. 


“When /U or /V mode is specified : 


Arithmetic is performed on IEEE finite numbers. 

Operations give imprecise traps whenever the following occur: 
— an operand is a non-finite number 

— anunderflow 

— aninteger overflow 

— a floating overflow 

— adivide-by-zero 

— an invalid operation 
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e Traps are imprecise, and it is not always possible to determine which instruction trig- 
gered a trap or the operands of that instruction. 

e An underflow trap produces a zero. 

e A conversion to integer trap with an integer overflow produces the low-order bits of the 
integer. 

¢ =6The result of any other operation that traps is UNPREDICTABLE. 


When /SU or /SV mode is specified: 
e Arithmetic is performed on all IEEE values, both finite and non-finite. 
e Alpha systems support all IEEE features except inexact exception (which requires /SUI 
or /SVI): 

— The IEEE standard specifies a default where exceptions do not fault or trap.In com- 
bination with the FPCR, this mode allows disabling exceptions and producing 
TEEE compliant nontrapping results. See Sections 4.7.7.10 and 4.7.7.11. 

— Each Alpha operating system provides a way to optionally signal IEEE floating- 
point exceptions. This mode enables the IEEE status flags that keep a record of 
each exception that is encountered. An Alpha operating system uses the IEEE float- 
ing-point control (FP_C) quadword, described in Appendix B, to maintain the IEEE 
status flags and to enable calls to IEEE user signal handlers. _ 

e Exceptions signaled in this mode are precise and an application can locate the instruc- 
tion that caused the exception, along with its operand values. See Section 4.7.7.3. 


When /SUI or /SVI mode is specified: 
e Arithmetic is performed on all IEEE values, both finite and non-finite. 


e Inexact exceptions are supported, along with all the other [IEEE features supported by 
the /SU or /SV mode. 


A summary of the IEEE trapping modes, instruction notation, and their meaning follows in 
Table 4—9: 


Table 4-9: Summary of IEEE Trapping Modes 


Trap Mode Notation Meaning 


Underflow disabled and No qualifier Imprecise 
inexact disabled 


Underflow enabled and /U | Imprecise 
inexact disabled /SU Precise exception completion 
Underflow enabled and /SUI Precise exception completion 


inexact enabled 


Integer overflow disabled and No qualifier Imprecise 
inexact disabled 
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Table 4-9: Summary of IEEE Trapping Modes (Continued) 


Trap Mode Notation Meaning 
Integer overflow enabled and /V Imprecise 
inexact disabled ISV Precise exception completion 
Integer overflow enabled and /SVI Precise exception completion 


inexact enabled 


4.7.7.3 Arithmetic Trap Completion 


Because floating-point instructions may be pipelined, the trap PC can be an arbitrary number 
of instructions past the one triggering the trap. Those instructions that are executed after the 
trigger instruction of an arithmetic trap are collectively referred to as the trap shadow of the 
trigger instruction. 


Marking floating-point instructions for exception completion with any valid qualifier combina- 
tion that includes the /S qualifier enables the completion of the triggering instruction. For any 
instruction so marked, the output register for the triggering instruction cannot also be one of 
the input registers, so that an input register cannot be overwritten and the input value is avail- 
able after a trap occurs. 


See Appendix B, Alpha Support for Operating System Completion Handlers, for more 
information. 


The AMASK instruction reports how the arithmetic trap should be completed: 


e =6If AMASK returns with bit 9 clear, floating-point traps are imprecise. Exception com- 
pletion requires that generated code must obey the trap shadow rules in Section 
4.7.7.3.1, with a trap shadow length as described in Section 4.7.7.3.2. 


e =6If AMASK returns with bit 9 set, the hardware implements precise floating-point traps. 
If the instruction has any valid qualifier combination that includes /S, the trap PC points 
to the instruction that immediately follows the instruction that triggered the trap. The 
trap shadow contains zero instructions; exception completion does not require that the 
generated code follow the conditions in Section 4.7.7.3.1 and the length rules in Section 
4.7.7.3.2. 


\Implementation Note: 
Hardware is strongly encouraged to implement precise traps.\ 


4.7.7.3.1 Trap Shadow Rules 


For an operating system (OS) completion handler to complete non-finite operands and excep- 
tions, the following conditions must hold. 


Conditions 1 and 2, below, allow an OS completion handler to locate the trigger instruction by 
doing a linear scan backwards from the trap PC while comparing destination registers in the 
trap shadow with the registers that are specified in the register write mask parameter to the 
arithmetic trap. 
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Condition 3 allows an OS completion handler to emulate the trigger instruction with its origi- 
nal input operand values. 


Condition 4 allows the handler to re-execute instructions in the trap shadow with their original 
operand values. 


Condition 5 prevents any unusual side effects that would cause problems on repeated execu- 
tion of the instructions in the trap shadow. 


Conditions: 


1. The destination register of the trigger instruction may not be used as the destination reg- 
ister of any instruction in the trap shadow. 


_ 2. The trap shadow may not include any branch or jump instructions. 
An instruction in the trap shadow may not modify an input to the trigger instruction. 


4. The value in a register or memory location that is used as input to some instruction in 
the trap shadow may not be modified by a subsequent instruction in the trap shadow 
unless that value is produced by an earlier instruction in the trap shadow. 


5. The trap shadow may not contain any instructions with side effects that interact with 
earlier instructions in the trap shadow or with other parts of the system. Examples of 
operations with prohibited side effects are: 


— Modifications of the stack pointer or frame pointer that can change the accessibility . 
of stack variables and the exception context that is used by earlier instructions in 
the trap shadow. 


— Modifications of volatile values and access to I/O device registers. 


— If order of exception reporting is important, taking an arithmetic trap by an integer 
instruction or by a floating-point instruction that does not include a /S qualifier, 
either of which can report exceptions out of order. 


An instruction may be in the trap shadows of multiple instructions that include a /S qualifier. 
That instruction must obey all conditions for all those trap shadows. For example, the destina- 
tion register of an instruction in multiple trap shadows must be different than the destination 
registers of each possible trigger instruction. 


4.7.7.3.2 Trap Shadow Length Rules 


The trap shadow length rules in Table 4-10 apply only to those floating-point instructions 
with any valid qualifier combination that includes a/S trap qualifier. Further, the instruction to 
which the trap shadow extends is not part of the trap shadow and that instruction is not exe- 
cuted prior to the arithmetic trap that is signaled by the trigger instruction. 


Implementation notes: 


¢ On Alpha implementations for which the IMPLVER instruction returns the value 0, the 
trap shadow of an instruction may extend after the result is consumed by a float- 
ing-point STx instruction. On all other implementations, the trap shadow ends when a 
result is consumed. 


e Because Alpha implementations need not execute instructions that have R31 or F31 as 
the destination operand, instructions with such an destination should not be thought to 
end a trap shadow. 
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Table 4-10: Trap Shadow Length Rules 


Floating-Point Trap Shadow Extends Until Any of the Following 
Instruction Group Occurs: 


Floating-point operate 
(except DIVx and SQRTx) 
e Encountering a CALL_PAL, EXCB, or TRAPB 
instruction. 


e The result is consumed by any instruction except 
floating-point STx. 


¢ The fourth instruction’ after the result is consumed by 
a floating-point STx instruction. 


Or, following the floating-point STx of the result, 
the result of a LDx that loads the stored value is 
consumed by any instruction. 


e The result of a subsequent floating-point operate 
instruction is consumed by any instruction except 
floating-point STx. 


¢ The second instruction’ after the result of a subse- 
quent floating-point operate instruction is consumed 
by a floating-point STx instruction. 


e The result of a subsequent floating-point DIVx or 
SQRTx instruction is consumed by any instruction. 
Floating-point DIVx 


e Encountering a CALL_PAL, EXCB, or TRAPB 
instruction. 


e The result is consumed by any instruction except 
floating-point STx. . 


¢ The fourth instruction’ after the result is consumed by 
a floating-point STx instruction. 


Or, following the floating-point STx of the result, 
the result of a LDx that loads the stored value is 
consumed by any instruction. 


e = The result of a subsequent floating-point DIVx is con- 
sumed by any instruction. 
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Table 4-10: Trap Shadow Length Rules (Continued) 


Floating-Point Trap Shadow Extends Until Any of the Following 
Instruction Group Occurs: 


Floating-point SQRTx 
e Encountering a CALL_PAL, EXCB, or TRAPB 
instruction. 


¢ The result is consumed by any instruction. 


e The result of a subsequent SQRTx instruction is con- 
sumed by any instruction. 


T The length of four instructions is a conservative estimate of how far the trap shadow may 
extend past a consuming floating-point STx instruction. The length of two instructions is a 
conservative estimate of how far the trap shadow may extend after a subsequent float- 
ing-point operate instruction is consumed by a floating-point STx instruction. Compilers can 
make a more precise estimate by consulting the DECchip 21064 and DECchip 21064A 
Alpha AXP Microprocessors Hardware Reference Manual, EC-QD2RA-TE. 


4.7.7.4 Invalid Operation (INV) Arithmetic Trap 


An invalid operation arithmetic trap is signaled if an operand is a non-finite number or if an 
operand is invalid for the operation to be performed. (Note that CMPTxy does not trap on plus 
or minus infinity.) Invalid operations are: 


e Any operation on a signaling NaN. 

e Addition of unlike-signed infinities or subtraction of like-signed infinities, such as 
(+infinity + —infinity) or (+infinity — +infinity). 

¢ Multiplication of 0*infinity. 

¢ JEEE division of 0/0 or infinity/infinity. 

¢ Conversion of an infinity or NaN to an integer. 

¢ CMPTLE or CMPTLT when either operand is a NaN. 

¢ SQRTx of a negative non-zero number. 


The instruction cannot disable the trap and, if the trap occurs, am UNPREDICTABLE value is 
stored in the result register. However, under some conditions, the FPCR can dynamically dis- 
able the trap, as described in Section 4.7.7.10, producing a correct IEEE result, as described in 
Section 4.7.10. 


IEEE-compliant system software must also supply an invalid operation indication to the user 
for x REM 0 and for conversions to integer that take an integer overflow trap. 

If an implementation does not support the DZED (division by zero disable) bit, it may respond 
to the IEEE division of 0/0 by delivering a division by zero trap to the operating system, 
which IEEE compliant software must change to an invalid operation trap for the user. 
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4.7.7.5 


4.7.7.6 


An implementation may choose not to take an INV trap for a valid IEEE operation that 
involves denormal operands if: 


e¢ The instruction is modified by any valid qualifier combination that includes the /S 
(exception completion) qualifier. 


¢ The implementation supports the DNZ (denormal operands to zero) bit and DNZ is set. 


¢ The instruction produces the result and exceptions required by Section 4.7.10, as modi- 
fied by the DNZ bit described in Section 4.7.7.11. 


An implementation may choose not to take an INV trap for a valid IEEE operation that 
involves denormal operands, and direct hardware implementation of denormal arithmetic is 
permitted if: 


¢ The instruction is modified by any valid qualifier combination that includes the /S 
(exception completion) qualifier. 


¢ The implementation supports both the DNOD (denormal operand exception disable) bit 
and the DNZ (denormal operands to zero) bit and DNOD is set while DNZ is clear. 


¢ The instruction produces the result and exceptions required by Section 4.7.10, possibly 
modified by the UDNZ bit described in Section 4.7.7.11. 


Regardless of the setting of the INVD (invalid operation disable) bit, the implementation may 
choose not to trap on valid operations that involve quiet NaNs and infinities as operands for 
IEEE instructions that are modified by any valid qualifier combination that includes the /S 
(exception completion) qualifier. 


Division by Zero (DZE) Arithmetic Trap 


A division by zero arithmetic trap is taken if the numerator does not cause an invalid operation 
trap and the denominator is zero. 


The instruction cannot disable the trap and, if the trap occurs, an UNPREDICTABLE value is 
stored in the result register. However, under some conditions, the FPCR can dynamically dis- 
able the trap, as described in Section 4.7.7.10, producing a correct IEEE result, as described in 
Section 4.7.10. 


If an implementation does not support the DZED (division by zero disable) bit, it may respond 
to the IEEE division of 0/0 by delivering a division by zero trap to the operating system, 
which IEEE compliant software must change to an invalid operation trap for the user. 


Overflow (OVF) Arithmetic Trap 


An overflow arithmetic trap is signaled if the rounded result exceeds in magnitude the largest 
finite number of the destination format. 


The instruction cannot disable the trap and, if the trap occurs, am UNPREDICTABLE value is 
stored in the result register. However, under some conditions, the FPCR can dynamically dis- 


able the trap, as described in Section 4.7.7.10, producing a correct IEEE result, as described in 
Section 4.7.10. 
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4.7.7.7 Underflow (UNF) Arithmetic Trap 


An underflow occurs if the rounded result is smaller in magnitude than the smallest finite num- 
ber of the destination format. 


If an underflow occurs, a true zero (64 bits of zero) is always stored in the result register. In 
the case of an IEEE operation that takes an underflow arithmetic trap, a true zero is stored 
even if the result after rounding would have been —0 (underflow below the negative denormal 
range). 


If an underflow occurs and underflow traps are enabled by the instruction, an underflow arith- 
metic trap is signaled. However, under some conditions, the FPCR can dynamically disable 
the trap, as described in Section 4.7.7.10, producing the result described in Section 4.7.10, as 
modified by the UNDZ bit described in Section 4.7.7.11. 


4.7.7.8 Inexact Result (INE) Arithmetic Trap 


An inexact result occurs if the infinitely precise result differs from the rounded result. 


If an inexact result occurs, the normal rounded result is still stored in the result register. If an 
inexact result occurs and inexact result traps are enabled by the instruction, an inexact result 
arithmetic trap is signaled. However, under some conditions, the FPCR can dynamically dis- 
able the trap; see Section 4.7.7.10 for information. 


4.7.7.9 Integer Overflow (IOV) Arithmetic Trap 


In conversions from floating to quadword integer, an integer overflow occurs if the rounded 
result is outside the range —2**63..2**63-1. In conversions from quadword integer to long- 
word integer, an integer overflow occurs if the result is outside the range —2**31..2**31-1. 


If an integer overflow occurs in CVTxQ or CVTQL, the true result truncated to the low-order 
64 or 32 bits respectively is stored in the result register. 


if an integer overflow occurs and integer overflow traps are enabied by the instruction, an inte- 
ger overflow arithmetic trap is signaled. 


4.7.7.10 IEEE Floating-Point Trap Disable Bits 


In the case of IEEE exception completion modes, any of the traps described in Sections 4.7.7.4 
through 4.7.7.9 may be disabled by setting the appropriate trap disable bit in the FPCR. The 
trap disable bits only affect the IEEE trap modes when the instruction is modified by any valid 
qualifier combination that includes the /S (exception completion) qualifier. The trap disable 
bits (DNOD, DZED, INED, INVD, OVFD, and UNFD) do not affect any of the VAX trap 
modes. 


If a trap disable bit is set and the corresponding trap condition occurs, the hardware implemen- 
tation sets the result of the operation to the nontrapping result value as specified in the IEEE 
standard and Section 4.7.10 and modified by the denormal control bits. If the implementation 
is unable to calculate the required result, it ignores the trap disable bit and signals a trap as 
usual. 


Note that a hardware implementation may choose to support any subset of the trap disable 
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bits, including the empty subset. 


4.7.7.11 TEEE Denormal Control Bits 


In the case of IEEE exception completion modes, the handling of denormal operands and 
results is controlled by the DNZ and UNDZ bits in the FPCR. These denormal control bits 
only affect denormal handling by IEEE instructions that are modified by any valid qualifier 
combination that includes the /S (exception completion) qualifier. 


The denormal control bits apply only to the IEEE operate instructions - ADD, SUB, MUL, 
DIV, SQRT, CMPxx, and CVT/ with floating-point source operand. 
/ 


/ 
If both the UNFD (underflow disable) bit and the UNDZ (underflow to zero) bit are set in the 
FPCR, the implementation sets the result of an underflow operation to a true zero result. The 
zeroing of a denormal result by UNDZ must also be treated as an inexact result. 


If the DNZ (denormal operands to zero) bit is set in the FPCR, the implementation treats each 
denormal operand as if it were a signed zero value. The source operands in the register are not 
changed. If DNZ is set, IEEE operations with any valid qualifier combination that includes a 
/S qualifier signal arithmetic traps as if any denormal operand were zero; that is, with DNZ 
set: 


e An JEEE operation with a denormal operand never generates an overflow, underflow, or 
inexact result arithmetic trap. 


e Dividing by a denormal operand is a division by zero or invalid operation as appropri- 
ate. 


e Multiplying a denormal by infinity is an invalid operation. 
¢ A SQRT of a negative denormal produces a —0 instead of an invalid operation. 


e A denormal operand, treated as zero, does not take the denormal operand exception trap 
controlled by the DNOD bit in the FPCR. 


Note that a hardware implementation may choose to support any subset of the denormal con- 
trol bits, including the empty subset. 


4.7.8 Floating-Point Control Register (FPCR) 


When an JEEE floating-point operate instruction specifies dynamic mode (/D) in its function 
field (function field bits <12:11>= 11), the rounding mode to be used for the instruction is 
derived from the FPCR register. The layout of the rounding mode bits and their assignments 
matches exactly the format used in the 11-bit function field of the floating-point operate 
instructions. The function field is described in Section 4.7.9. 


In addition, the FPCR gives a summary of each exception type for the exception conditions 
detected by all IEEE floating-point operates thus far, as well as an overall summary bit that 
indicates whether any of these exception conditions has been detected. The individual excep- 
tion bits match exactly in purpose and order the exception bits found in the exception 
summary quadword that is pushed for arithmetic traps. However, for each instruction, these 
exception bits are set independent of the trapping mode specified for the instruction. There- 
fore, even though trapping may be disabled for a certain exceptional condition, the fact that 
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the exceptional condition was encountered by an instruction is still recorded in the FPCR. 


Floating-point operates that belong to the IEEE subset and CVTQL, which belongs to both 
VAX and IEEE subsets, appropriately set the FPCR exception bits. It is UNPREDICTABLE 


whether floating-point operates that belong only to the VAX floating-point subset set the 
FPCR exception bits. 


Alpha floating-point hardware only transitions these exception bits from zero to one. Once set 
to one, these exception bits are only cleared when software writes zero into these bits by writ- 
ing a new value into the FPCR. . 

Section 4.7.2 allows certain of the FPCR bits to be subsetted. 

The format of the FPCR is shown in Figure 4—1 and described in Table 4-11. 

Figure 4-1: Floating-Point Control Register (FPCR) Format 
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Table 4-11: Floating-Point Control Register (FPCR) Bit Descriptions 


Bit Description (Meaning When Set) 


63 Summary Bit (SUM). Records bitwise OR of FPCR exception bits. Equal to 
FPCR<57 156 1 55 154153 | 52>. 


62 Inexact Disable (INED)’. Suppress INE trap and place correct IEEE nontrapping 
result in the destination register. 


CN 
a 


Underflow Disable (UNED)". Suppress UNF trap and place correct IEEE nontrap- 
ping result in the destination register if the implementation is capable of produc- 
ing correct IEEE nontrapping result. The correct result value is determined 
according to the value of the UNDZ bit. 


60 Underflow to Zero (UNDZ)". When set together with UNFD, on underflow, the 


hardware places a true zero (64 bits of zero) in the destination register rather than 
the result specified by the IEEE standard. 


59-58 Dynamic Rounding Mode (DYN). Indicates the rounding mode to be used by an 
IEEE floating-point operate instruction when the instruction’s function field spec- 
ifies dynamic mode (/D). Assignments are: 


DYN IEEE Rounding Mode Selected 


00 Chopped rounding mode 
O01 | Minus infinity 

10 Normal rounding 

11 Plus infinity 
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Table 4-11: Floating-Point Control Register (FPCR) Bit Descriptions (Continued) 


Bit Description (Meaning When Set) 


mi | Integer Overflow (IOV). An integer arithmetic operation or a conversion from 
floating to integer overflowed the destination precision. 


56 Inexact Result (INE). A floating arithmetic or conversion operation gave a result 
that differed from the mathematically exact result. 


25. Underflow (UNF). A floating arithmetic or conversion operation underflowed the 
destination exponent. 


54 Overflow (OVF). A floating arithmetic or conversion operation overflowed the 
destination exponent. 


53 Division by Zero (DZE). An attempt was made to perform a floating divide oper- 
ation with a divisor of zero. 

52 Invalid Operation (INV). An attempt was made to perform a floating arithmetic, 
conversion, or comparison operation, and one or more of the operand values were 
illegal. 

51 Overflow Disable (OVED)’. Suppress OVE trap and place correct IEEE nontrap- 


ping result in the destination register if the implementation is capable of produc- 
ing correct IEEE nontrapping results. 


50 Division by Zero Disable (DZED)'. Suppress DZE trap and place correct IEEE 
nontrapping result in the destination register if the implementation is capable of 
producing correct IEEE nontrapping results. 


49 Invalid Operation Disable (INVD)". Suppress INV trap and place correct IEEE 
nontrapping result in the destination register if the implementation is capable of 
producing correct IEEE nontrapping results. 


48 Denormal Operands to Zero (DNZ)'. Treat all denormal operands as a signed zero 
value with the same sign as the denormal. 


47  Denormal Operand Exception Disable (DNOD)'. Suppress INV trap for valid 
operations that involve denormal operand values and place the correct IEEE non- 
trapping result in the destination register if the implementation is capable of pro- 
cessing the denormal operand. If the result of the operation underflows, the 
correct result is determined according to the value of the UNDZ bit. If DNZ is set, 
DNOD has no effect because a denormal operand is treated as having a zero value 
instead of a denormal value. 


46-0 Reserved. Read as Zero. Ignored when written. 
' Bit only has meaning for IEEE instructions when any valid qualifier combination that 


includes exception completion (/S) is specified. 


FPCR is read from and written to the floating-point registers by the MT_FPCR and MF_FPCR 
instructions respectively, which are described in Section 4.7.8.1. 
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FPCR and the instructions to access it are required for an implementation that supports float- 
ing-point (see Section 4.7.8). On implementations that do not support floating-point, the 
instructions that access FPRCR (MF_FPCR and MT_FPCR) take an [legal Instruction Trap. 


Software Note: 


Support for FPCR is required on a system that supports the OpenVMS Alpha operating 
system even if that system does not support floating-point. 


4.7.8.1 Accessing the FPCR 


Because Alpha floating-point hardware can overlap the execution of a number of float- 
ing-point instructions, accessing the FPCR must be synchronized with other floating-point 
instructions. An EXCB instruction must be issued both prior to and after accessing the FPCR 
to ensure that the FPCR access is synchronized with the execution of previous and subsequent 
floating-point instructions; otherwise synchronization is not ensured. 


Issuing an EXCB followed by an MT_FPCR followed by another EXCB ensures that only 
floating-point instructions issued after the second EXCB are affected by and affect the new 
value of the FPCR. Issuing an EXCB followed by an MF_FPCR followed by another EXCB 
ensures that the value read from the FPCR only records the exception information for float- 
ing-point instructions issued prior to the first EXCB. 


Consider the following example: 


ADDT/D 

EXCB oi 
MT_FPCR F1,F1,F1l 

EXCB :2 
SUBT/D 


Without the first EXCB, it is possible in an implementation for the ADDT/D to execute in par- 
allel with the MT_FPCR. Thus, it would be UNPREDICTABLE whether the ADDT/D was 
affected by the new rounding mode set by the MT_FPCR and whether fields cleared by the 
MT_FPCR in the exception summary were subsequently set by the ADDT/D. 


Without the second EXCB, it is possible in an implementation for the MT_FPCR to execute in 
parallel with the SUBT/D. Thus, it would be UNPREDICTABLE whether the SUBT/D was 
affected by the new rounding mode set by the MT_FPCR and whether fields cleared by the 
MT_FPCR in the exception summary field of FPCR were previously set by the SUBT/D. 


Specifically, code should issue an EXCB before and after it accesses the FPCR if that code 
needs to see valid values in FPCR bits <63> and <57:52>. An EXCB should be issued before 
attempting to write the FPCR if the code expects changes to bits <59:52> not to have depen- 
dencies with prior instructions. An EXCB should be issued after attempting to write the FPCR 
if the code expects subsequent instructions to have dependencies with changes to bits <59:52>. 
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4.7.8.2 Default Values of the FPCR 
Processor initialization leaves the value of FPRCR UNPREDICTABLE. 


Software Note: 


DIGITAL software should initialize FRCR<DYN> = 10 during program activation. Using 
this default, a program can be coded to use only dynamic rounding without the need to 
explicitly set the rounding mode to normal rounding in its start-up code. 


Program activation normally clears all other fields in the FPCR. However, this behavior 
may depend on the operating system. 


4.7.8.3 Saving and Restoring the FPCR 


The FPCR must be saved and restored across context switches so that the FPCR value of one 
process does not affect the rounding behavior and exception summary of another process. 


The dynamic rounding mode put into effect by the programmer (or initialized by image activa- 
tion) is valid for the entirety of the program and remains in effect until subsequently changed 
by the programmer or until image run-down occurs. 


Software Notes: 


The following software notes apply to saving and restoring the FPCR: 


1. 
2 
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The IEEE standard precludes saving and restoring the FPCR across subroutine calls. 


The IEEE standard requires that an implementation provide status flags that are set 
whenever the corresponding conditions occur and are reset only at the user’s request. 
The exception bits in the FPCR do not satisfy that requirement, because they can be 
spuriously set by instructions in a trap shadow that should not have been executed had 
the trap been taken synchronously. 


The IEEE status flags can be provided by software (as software status bits) as follows: 


Trap interface software (usually the operating system) keeps a set of software 
status bits and a mask of the traps that the user wants to receive. Code is generated 
with the /SUI qualifiers. For a particular exception, the software clears the 
corresponding trap disable bit if either the corresponding software status bit is 0 or 
if the user wants to receive such traps. If a trap occurs, the software locates the 
offending instruction in the trap shadow, simulates it and sets any of the software 
status bits that are appropriate. Then, the software either delivers the trap to the 
user program or disables further delivery of such traps. The user program must 
interface to this trap interface software to set or clear any of the software status 
bits or to enable or disable floating-point traps. See Appendix B. 


When such a scheme is being used, the trap disable bits and denormal control bits 
should be modified only by the trap interface software. If the disable bits are 
spuriously cleared, unnecessary traps may occur. If they are spuriously set, the 
software may fail to set the correct values in the software status bits. Programs should 
call routines in the trap interface software to set or clear bits in the FPCR. 
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DIGITAL software may choose to initialize the software status bits and the trap 
disable bits to all 1’s to avoid any initial trapping when an exception condition first 
occurs. Or, software may choose to initialize those bits to all 0’s in order to provide a 
summary of the exception behavior when the program terminates. 


In any event, the exception bits in the FPCR are still useful to programs. A program 
can clear all of the exception bits in the FPCR, execute a single floating-point 
instruction, and then examine the status bits to determine which hardware-defined 
exceptions the instruction encountered. For this operation to work in the presence of 
various implementation options, the single instruction should be followed by a 
TRAPB or EXCB instruction, and exception completion by the system software 
should save and restore the FPCR registers without other modifications. 


3. Because of the way the LDS and STS instructions manipulate bits <61:59> of float- 
ing-point registers, they should not be used to manipulate FPCR values. 


4.7.9 Floating-Point Instruction Function Field Format 


The function code for IEEE and VAX floating-point instructions, bits <15..5>, contain the 
function field. That field is shown in Figure 4—2 and described for IEEE floating-point in 
Table 4—12 and for VAX floating-point in Table 4-13. Function codes for the independent 
floating-point instructions, those with opcode 17,¢, do not correspond to the function fields 


below. 


The function field contains subfields that specify the trapping and rounding modes that are 
enabled for the instruction, the source datatype, and the instruction class. 


Figure 4—2: Floating-Point Instruction Function Field 
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Table 4-12: [EEE Floating-Point Function Field Bit Summary 


Bits Field Meaning’ 


15-13 TRP Trapping modes: 


Contents 


000 
001 


010 


011 


100 


101 


110 


111 


Meaning for Opcodes 14, and 164¢ 


Imprecise (default) 

Underflow enable (/U) — floating-point output 
Integer overflow enable (/V) — integer output 
UNPREDICTABLE for opcode 16 ¢ instructions 


Reserved for opcode 14, ¢ instructions 
UNPREDICTABLE for opcode 16)¢ instructions 
Reserved for opcode 14,¢ instructions | 
UNPREDICTABLE for opcode 164¢ instructions 
Reserved for opcode 146 instructions 

/SU — floating-point output 

/SV — integer output 

UNPREDICTABLE for opcode 16)¢ instructions 
Reserved for opcode 14¢ instructions 


/SUI — floating-point output 
/SVI — integer output 


12-11 RND _ Rounding modes: 


Contents 
00 
01 
10 
11 


Meaning for Opcodes 161¢ and 14;¢ 
Chopped (/C) 

Minus infinity (/M) 

Normal (default) 

Dynamic (/D) 


10-9 SRC Source datatype: 


Contents 


00 
01 
10 
11 


Meaning for Meaning for 
Opcode 164¢ Opcode 1416 
S_floating S_floating 
Reserved Reserved 
T_floating T_floating. 
Q_ fixed Reserved 
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Table 4-12: IEEE Floating-Point Function Field Bit Summary (Continued) 
Bits Field Meaning’ 


8-5 FNC Instruction class: 

Contents Meaning for Meaning for 
Opcode 16 4¢ Opcode 144¢ 

0000 ADDx Reserved 

0001 SUBx Reserved 

0010 MULx Reserved 

0011 DIVx Reserved 

0100 CMPxUN ITOFS/ITOFT 

0101 CMPxEQ Reserved 

0110 CMPxLT Reserved 

0111 CMPxLE ‘Reserved 

1000 Reserved Reserved 

1001 Reserved Reserved 

1010 Reserved Reserved 

1011 Reserved SQRTS/SQRTT 

1100 CVTxS Reserved 

1101 Reserved Reserved 

1110 CVTxT Reserved 

1111 CVTxQ Reserved 


7 Encodings for the instructions CVTST and CVTST/S are exceptions to this table; use the 
encodings in Appendix C. 
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Table 4-13: VAX Floating-Point Function Field Bit Summary 


Bits Field Meaning 


15-13 TRP Trapping modes: 


Contents 


000 
001 


010 


011 


100 
101 


110 


111 


Meaning for Opcodes 14;¢ and 154¢ 


Imprecise (default) 

Underflow enable (/U) — floating-point output 
Integer overflow enable (/V) — integer output 
UNPREDICTABLE for opcode 1546 instructions 


Reserved for opcode 14)¢ instructions 
UNPREDICTABLE for opcode 15)¢ instructions 
Reserved for opcode 14,¢ instructions 

/S — Exception completion enable 

/SU — floating-point output 

/SV — integer output 

UNPREDICTABLE for opcode 15,¢ instructions 
Reserved for opcode 14,¢ instructions 
UNPREDICTABLE for opcode 15,¢ instructions 
Reserved for opcode 14,¢ instructions 


12-11 RND _ Rounding modes: 


Contents 


00 
01 
10 
11 


Meaning for Opcodes 15,¢ and 144¢ 
Chopped (/C) 

UNPREDICTABLE 

Normal (default) 
UNPREDICTABLE 


10-9 SRC Source datatype:' 


Contents 
00 
01 
10 
11 


Meaning for Opcode 15;, Meaning for Opcode 14¢ 


F_floating F_floating 
D_floating F_floating 
G_floating G_floating 
Q_ fixed Reserved 
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Table 4-13: VAX Floating-Point Function Field Bit Summary (Continued) 


Bits Field Meaning 


8-5 FNC Instruction class: 
Contents Meaning for Meaning for 
Opcode 154¢ Opcode 14¢ 
0000 ADDx Reserved 
0001 SUBx Reserved 
0010 MULx Reserved 
0011 DIVx Reserved 
0100 — CMPxUN ITOFF 
0101 CMPxEQ Reserved 
0110 CMPxLT Reserved 
0111 CMPxLE Reserved 
1000 Reserved Reserved 
1001 Reserved Reserved 
1010 Reserved . SQRTF/SQRTG 
1011 Reserved Reserved 
1100 CVTxF Reserved 
1101 CVTxD Reserved 
1110 CVTxG Reserved 


1111 CVTxQ Reserved 


¥ Inthe SRC field, both 00 and 01 specify the F_floating source datatype for opcode 144¢. 


4.7.10 IEEE Standard 


The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/TEEE Standard 754-1985) is 
included by reference. 


This standard leaves certain operations as implementation dependent. The remainder of this 
section specifies the behavior of the Alpha architecture in these situations. Note that this 
behavior may be supplied by either hardware (if the invalid operation disable, or INVD, bit is 
implemented) or by software. See Sections 4.7.7.10, 4.7.7.11, 4.7.8, 4.7.8.3, and Appendix B. 


4.7.10.1 Conversion of NaN and Infinity Values 


Conversion of a NaN or an Infinity value to an integer gives a result of zero. 


Conversion of a NaN value from S_floating to T_floating gives a result identical to the input, 
except that the most significant fraction bit (bit 51) is set to indicate a quiet NaN. 


Conversion of a NaN value from T_floating to S_floating gives a result identical to the input, 
except that the most significant fraction bit (bit 51) is set to indicate a quiet NaN, and bits 
<28:0> are cleared to zero. 
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4.7.10.2 Copying NaN Values 


Copying a NaN value without changing its precision does not cause an invalid operation 
exception. 


-4,7,10.3 Generating NaN Values 


When an operation is required to produce a NaN and none of its inputs are NaN values, the 
result of the operation is the quiet NaN value that has the sign bit set to one, all exponent bits 
set to one (to indicate a NaN), the most significant fraction bit set to one (to indicate that the 
NaN is quiet), and all other fraction bits cleared to zero. This value is referred to as "the canon- 
ical quiet NaN." 


4.7.10.4 Propagating NaN Values 


When an operation is required to produce a NaN and one or both of its inputs are NaN values, 
the [EEE standard requires that quiet NaN values be propagated when possible. With the 
Alpha architecture, the result of such an operation is a NaN generated according to the first of 
the following rules that is applicable: 


1. 


If the operand in the Fb register of the operation is a quiet NaN, that value is used as the 
result. 


If the operand in the Fb register of the operation is a signaling NaN, the result is the 
quiet NaN formed from the Fb value by setting the most significant fraction bit (bit 51) 
to a one bit. 


If the operation uses its Fa operand and the value in the Fa register is a quiet NaN, that 
value is used as the result. 


If the operation uses its Fa operand and the value in the Fa register is a signaling NaN, 
the result is the quiet NaN formed from the Fa value by setting the most significant 
fraction bit (bit 51) to a one bit. 


The result is the canonical quiet NaN. 
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4.8 Memory Format Floating-Point Instructions 
The instructions in this section move data between the floating-point registers and memory. 
They use the Memory instruction format. They do not interpret the bits moved in any way; spe- 
cifically, they do not trap on non-finite values. 


The instructions are summarized in Table 4—14. — 


Table 4—14: Memory Format Floating-Point Instructions Summary 


Mnemonic Operation _ Subset 
LDF Load F_floating VAX 
LDG Load G_floating (Load D_floating) VAX 
LDS Load S_floating (Load Longword Integer) Both 
LDT | Load T_floating (Load Quadword Integer) Both 
STF Store F_floating | VAX 
STG Store G_floating (Store D_floating) VAX 
STS Store S_ floating (Store Longword Integer) Both 
STT © Store T_floating (Store Quadword Integer) Both 
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4.8.1 Load F_floating 


Format: 
LDF Fa.wf,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp)} 


CASE 

big endian data: va’ < va XOR 100, 
little endian data: va’ < va 
ENDCASE 


Fa < (va’)<15> || MAP F((va’)<14:7>) || (va’)<6:0> || 
(va’)<31:16> || 0<28:0> 


Exceptions: 


Access Violation 
Fault on Read 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDF Load F_floating 


Qualifiers: 


None 


Description: 


LDF fetches an F_floating datum from memory and writes it to register Fa. If the data is not 
naturally aligned, an alignment exception is generated. 


The MAP_F function causes the 8-bit memory-format exponent to be expanded to an 11-bit 
register-format exponent according to Table 2-1. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian longword access, va<2> (bit 2 of the virtual address) is inverted, and 
any memory management fault is reported for va (not va’ ). The source operand is fetched 
from memory and the bytes are reordered to conform to the F_floating register format. The 
result is then zero-extended in the low-order longword and written to register Fa. 
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4.8.2 Load G_floating 


Format: 
LDG Fa.wg,disp.ab(Rb.ab) !Memory format 


Operation: 


va < {Rbv + SEXT(disp)} 
Fa < (va)<15:0> || (va)<31:16> || (va)<47:32> || (va)<63:48> 


Exceptions: 


Access Violation 
Fault on Read 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDG Load G_floating (Load D_floating) 


Qualifiers: 


None 


Description: 


LDG fetches a G_floating (or D_floating) datum from memory and writes it to register Fa. If 
the data is not naturally aligned, an alignment exception is generated. 


1 y adding register Rb to the sign-extended 16-bit displace- 
ment. the | source operand is fciched from memory, the bytes are reordered to conform to the 
G_floating register format (also conforming to the D_floating register format), and the result 
is then written to register Fa. 
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4.8.3 Load S_floating 


Format: 
LDS Fa.ws,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp) } 


CASE 

big endian data: va’ < va XOR 100, 
little endian data: va’ <- va 
ENDCASE | 


Fa < (va’)<31> || MAP _S((va’)<30:23>) || (va’)<22:0> || 0<28:0> 


Exceptions: 
Access Violation 
Fault on Read 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDS Load S_ floating (Load Longword Integer) 


Qualifiers: 


None 


Description: 


LDS fetches a longword (integer or S_floating) from memory and writes it to register Fa. If 
the data is not naturally aligned, an alignment exception is generated. The MAP_S function 
causes the 8-bit memory-format exponent to be expanded to an 11-bit register-format expo- 
nent according to Table 2-2. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian longword access, va<2> (bit 2 of the virtual address) is inverted, and 
any memory management fault is reported for va (not va’ ). The source operand is fetched 
from memory, is zero-extended in the low-order longword, and then written to register Fa. 
Longword integers in floating registers are stored in bits <63:62,58:29>, with bits <61:59> 
ignored and zeros in bits <28:0>. 
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4.8.4 Load T_floating 


Format: 
LDT Fa.wt,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp) } 


Fa < (va)<63:0> 


Exceptions: 
Access Violation 
Fault on Read 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDT Load T_floating (Load Quadword Integer) 


Qualifiers: 


None 


Description: 


LDT fetches a quadword (integer or T_floating) from memory and writes it to register Fa. If 
the data is not naturally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. The source operand is fetched from memory and written to register Fa. 
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4.8.5 Store F_floating 


Format: 
STF Fa.rf,disp.ab(Rb.ab) !Memory format 


Operation: 
va < {Rbv + SEXT(disp) } 
CASE 
big_ endian data: va’ <- va XOR 1002 


little endian data: va’ < va 
ENDCASE 


(va’)<31:0> < Fav<44:29> || Fav<63:62> || Fav<58:45> 


Exceptions: 
Access Violation 
Fault on Write 
Alignment 


Translation Not Valid 


Instruction mnemonics: 


STF Store F_floating 


Qualifiers: 


None 


Description: 


STF stores an F_floating datum from Fa to memory. If the data is not naturally aligned, an 
alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian longword access, va<2> (bit 2 of the virtual address) is inverted, and 
any memory management fault is reported for va (not va’ ). The bits of the source operand are 
fetched from register Fa, the bits are reordered to conform to F_floating memory format, and 
the result is then written to memory. Bits <61:59> and <28:0> of Fa are ignored. No checking 
is done. 
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4.8.6 Store G_floating 


Format: 


STG Fa.rg,disp.ab(Rb.ab) !Memory format 


Operation: 


va < {Rbv + SEXT(disp)} 
(va)<63:0> < Fav<15:0> || Fav<31:16> || Fav<47:32> || Fav<63:48> 


Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


STG Store G_floating (Store D_floating) 


Qualifiers: 


None 


Description: 


STG stores a G_floating (or D_floating) datum from Fa to memory. If the data is not naturally 
aligned, an alignment exception is generated. 


wit + Rb 


ute ter Rb to the sign-extended 16-bit displace- 
ment. The source operand is fetched from register Fa, the bytes are reordered to conform to 
the G_floating memory format (also conforming to the D_floating memory format), and the 


result is then written to memory. 


Tr 


ng regis 
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4.8.7 Store S_ floating 


Format: 
STS Fa.rs,disp.ab(Rb.ab) !Memory format 


Operation: 
va <¢ {Rbv + SEXT(disp)} 
CASE 
big endian data: va’ <- va XOR 100, 


little _endian_data: va’ < va 
ENDCASE 


(va’)<31:0> < Fav<63:62> || Fav<58:29> 


Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


STS Store S_floating (Store Longword Integer) 


Qualifiers: 


None 


Description: 


STS stores a longword (integer or S_floating) datum from Fa to memory. If the data is not nat- 
urally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. For a big-endian longword access, va<2> (bit 2 of the virtual address) is inverted, and 
any memory management fault is reported for va (not va’). The bits of the source operand are 
fetched from register Fa, the bits are reordered to conform to S_floating memory format, and 
the result is then written to memory. Bits <61:59> and <28:0> of Fa are ignored. No checking 
is done. 
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4.8.8 Store T_floating 


Format: 


STT Fa.rt,disp.ab(Rb.ab) 


Operation: 


va < {Rbv + SEXT(disp)} 
(va)<63:0> < Fav<63:0> 


Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


STT Store T_floating (Store Quadword Integer) 


Qualifiers: 


None 


Description: 


!Memory format 


STT stores a quadword (integer or T_floating) datum from Fa to memory. If the data is not nat- 


urally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit displace- 
ment. The source operand is fetched from register Fa and written to memory. 
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4.9 Branch Format Floating-Point Instructions 


Alpha provides six floating conditional branch instructions. These branch-format instructions 
test the value of a floating-point register and conditionally change the PC. 


They do not interpret the bits tested in any way; specifically, they do not trap on non-finite 
values. 


The test is based on the sign bit and whether the rest of the register is all zero bits. All 64 bits 
of the register are tested. The test is independent of the format of the operand in the register. 
Both plus and minus zero are equal to zero. A non-zero value with a sign of zero is greater 
than zero. A non-zero value with a sign of one is less than zero. No reserved operand or 
non-finite checking is done. 


The floating-point branch operations are summarized in Table 4-15: 


Table 4-15: Floating-Point Branch Instructions Summary 


Mnemonic Operation Subset 
FBEQ | Floating Branch Equal Both 
FBGE Floating Branch Greater Than or Equal Both 
FBGT Floating Branch Greater Than Both 
FBLE Floating Branch Less Than or Equal Both 
FBLT Floating Branch Less Than Both 
FBNE Floating Branch Not Equal Both 
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4.9.1 Conditional Branch 


Format: 
FBxx Fa.rq,disp.al | !Branch format 


Operation: 


{update PC} 

va < PC + {4*SEXT(disp) } 

IF TEST(Fav, Condition_based_on_Opcode) THEN 
PC < va 


Exceptions: 


None 


Instruction mnemonics: 


FBEQ Floating Branch Equal 
FBGE Floating Branch Greater Than or Equal 
FBGT Floating Branch Greater Than 
FBLE Floating Branch Less Than or Equal 
FBLT Floating Branch Less Than 
FBNE Floating Branch Not Equal 
Qualifiers: 
None 
Description: 


Register Fa is tested. If the specified relationship is true, the PC is loaded with the target vir- 
tual address; otherwise, execution continues with the next sequential instruction. 


The displacement is treated as a signed longword offset. This means it is shifted left two bits 
(to address a longword boundary), sign-extended to 64 bits, and added to the updated PC to 
form the target virtual address. 


The conditional branch instructions are PC-relative only. The 21-bit signed displacement gives 
a forward/backward branch distance of +/—1M instructions. 
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Notes: 


e To branch properly on non-finite operands, compare to F31, then branch on the result of 
the compare. 


e ©The largest negative integer (8000 0000 0000 0000) ¢) is the same bit pattern as floating 
minus zero, so it is treated as equal to zero by the branch instructions. To branch prop- 
erly on the largest negative integer, convert it to floating or move it to an integer regis- 
ter and do an integer branch. 
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4.10 Floating-Point Operate Format Instructions 


The floating-point bit-operate instructions perform copy and integer convert operations on 
64-bit register values. The bit-operate instructions do not interpret the bits moved in any way; 


specifically, they do not trap on non-finite values. 


The floating-point arithmetic-operate instructions perform add, subtract, multiply, divide, com- 
pare, register move, squre root, and floating convert operations on 64-bit register values in one 


of the four specified floating formats. 


Each instruction specifies the source and destination formats of the values, as well as the 
rounding mode and trapping mode to be used. These instructions use the Floating-point Oper- 


ate format. 
The floating-point operate instructions are summarized in Table 4-16. 


Table 4-16: Floating-Point Operate Instructions Summary 


Mnemonic Operation 

Bit and FPCR Operations: 

CPYS Copy Sign 

CPYSE Copy Sign and Exponent 

CPYSN Copy Sign Negate 

CVTLQ Convert Longword to Quadword 

CVTQL Convert Quadword to Longword 
FCMOVxx. Floating Conditional Move 

MF_FPCR Move from Floating-point Control Register 
MT_FPCR Move to Floating-point Control Register 


Arithmetic Operations 


ADDF Add F_floating 
ADDG Add G_floating 
ADDS Add S_ floating 
ADDT Add T_floating 


Subset 


Both 


‘Both 


Both 


Both 
Both 


Both 


Both 
Both 


VAX 
VAX 


IFEE 


IEEE 
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Table 4-16: Floating-Point Operate Instructions Summary (Continued) 


Mnemonic 


Arithmetic Operations 


CMPGxx 
CMPTxx 


CVTDG 
CVTGD 
CVTGF 
CVTGQ 
CVTQF 
CVTQG 
CVTQS 

CVTQT 
CVTST 

CVTTQ 
CVTTS 


DIVF 
DIVG 
DIVS 
DIVT 


FTOIS 
FTOIT 
ITOFF 
ITOFS 
ITOFT 


MULF 
MULG 
MULS 
MULT 


Operation Subset 
Compare G_floating VAX 
Compare T_floating TEEE 
Convert D_floating to G_floating VAX 
Convert G_floating to D_floating VAX 
Convert G_floating to F_floating VAX 
Convert G_floating to Quadword VAX 
Convert Quadword to F_floating VAX 
Convert Quadword to G_floating VAX 
Convert Quadword to S_ floating IEEE 
Convert Quadword to T_floating IEEE 
Convert S_floating to T_floating TEEE 
Convert T_floating to Quadword TEEE 
Convert T_floating to S_floating IEEE 
Divide F_floating VAX 
Divide G_floating VAX 
Divide S_floating IEEE 
Divide T_floating IEEE 
Floating-point to integer register move, S_floating TEEE 
Floating-point to integer register move, T_floating TEEE 
Integer to floating-point register move, F_floating VAX 
Integer to floating-point register move, S_floating TEEE 
Integer to floating-point register move, T_floating TEEE 
Multiply F_floating VAX 
Multiply G_floating VAX 
Multiply S_floating TEEE 
Multiply T_floating IEEE 
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Table 4—16: Floating-Point Operate Instructions Summary (Continued) 


Mnemonic Operation 
Arithmetic Operations 

SQRTF Square root F_floating 
SQRTG Square root G_floating 
SQRTS Square root S_ floating 
SQRTT Square root T_floating 
SUBF Subtract F_floating 
SUBG Subtract G_floating 
SUBS Subtract S_floating 
SUBT Subtract T_floating 
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Subset 


VAX 
VAX 
TEEE 
TEEE 


VAX 
VAX 
_ IEEE 
IEEE 
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4.10.1 Copy Sign 


Format: 
CPYSy Fa.rg,Fb.rg,Fc.wq !Floating-point Operate format 


Operation: 


CASE 
CPYS: Fe < Fav<63> || Fbv<62:0> 
CPYSN: Fe <- NOT(Fav<63>) || Fbv<62:0> 
CPYSE: Fe ¢ Fav<63:52> || Fov<51:0> 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


CPYS Copy Sign 
CPYSE Copy Sign and Exponent 
CPYSN Copy Sign Negate 
Qualifiers: 
None 
Description: 


For CPYS and CPYSN, the sign bit of Fa is fetched (and complemented in the case of 
CPYSN) and concatenated with the exponent and fraction bits from Fb; the result is stored in 
Fe. 


For CPYSE, the sign and exponent bits from Fa are fetched and concatenated with the fraction 
bits from Fb; the result is stored in Fc. 


No checking of the operands is performed. 


Notes: 


¢ Register moves can be performed using CPYS Fx,Fx,Fy. Floating-point absolute value 
can be done using CPYS F31,Fx,Fy. Floating-point negation can be done using 
CPYSN Fx,Fx,Fy. Floating values can be scaled to a known range by using CPYSE. 


DIGITAL Restricted Distribution 


Instruction Descriptions (I) 4-105 


4.10.2 Convert Integer to Integer 


Format: 
CVTxy Fb.rq,Fc.wx {Floating-point Operate format 


Operation: 
CASE 
CVIQL: Fo < Fbv<31:30> || 0<2:0> || Fbv<29:0> | |0<28:0> 
CVTLOQ: Fe < SEXT(Fbv<63:62> || Fbv<58:29>) 
ENDCASE 
Exceptions: 


Integer Overflow, CVTQL only 


Instruction mnemonics: 


CVTLQ Convert Longword to Quadword 

CVTQL Convert Quadword to Longword 
Qualifiers: 

Trapping: Exception Completion (/S) (CVTQL-only) 


Integer Overflow Enable (/V) (CVTQL only) 


Description: 


The two’s-complement operand in register Fb is converted to a two’s-complement result and 
written to register Fc. Register Fa must be F31. 

The conversion from quadword to longword is a repositioning of the low 32 bits of the opcr- 
and, with zero fill and optional integer overflow checking. Integer overflow occurs if Fb is 
outside the range —2**31..2**31-1. If integer overflow occurs, the truncated result is stored in 
Fc, and an arithmetic trap is taken if enabled. | 


The conversion from longword to quadword is a repositioning of 32 bits of the operand, with 
sign extension. 
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4.10.3 Floating-Point Conditional Move 


Format: 
FCMOVxx 


Operation: 


Fa.rq,Fb.rg,Fc.wq {Floating-point Operate format 


IF TEST(Fav, Condition_based_on_Opcode) THEN 


Fo <— Fbv 


Exceptions: 


None 


Instruction mnemonics: 


FCMOVEQ 
FCMOVGE 
FCMOVGT 
FCMOVLE 
FCMOVLT 
FCMOVNE 


Qualifiers: 


None 


Description: 


FCMOVE if Register Equal to Zero 

FCMOVE if Register Greater Than or Equal to Zero 
FCMOVE if Register Greater Than Zero 

FCMOVE if Register Less Than or Equal to Zero 
FCMOVE if Register Less Than Zero 

FCMOVE if Register Not Equal to Zero 


Register Fa is tested. If the specified relationship is true, register Fb is written to register Fc; 
otherwise, the move is suppressed and register Fc is unchanged. The test is based on the sign 
bit and whether the rest of the register is all zero bits, as described for floating branches in Sec- 


tion 4.9. 
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Notes: 


Except that it is likely in many implementations to be substantially faster, the instruction: 
FCMOVxx Fa,Fb,Fc 
is exactly equivalent to: 
FByy Fa, label ! yy = NOT xx 


CPYS Fb,Fb,Fc 
label: 


For example, a branchless sequence for: 
F1=MAX(F1,F2) 
iS: 


CMPxLT F1,F2,F3 ! F3=one if F1<F2; x=F/G/S/T 
FCMOVNE F3,F2,Fl 1 Move F2 to Fl if F1<F2 
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4.10.4 Move from/to Floating-Point Control Register 


Format: 
Mx_FPCR Fa.rq,Fa.rq,Fa.wq !Floating-point Operate format 


Operation: 


CASE 
MF FPCR: Fa © FPCR 
MT_FPCR: FPCR < Fav 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


MF_FPCR Move from Floating-point Control Register 
MT_FPCR Move to Floating-point Control Register 
Qualifiers: 
None 
Description: 


The Floating-point Control Register (FPCR) is read from (MF_FPCR) or written to 
(MT_FPCR), a floating-point register. The floating-point register to be used is specified by the 
Fa, Fb, and Fc fields all pointing to the same floating-point register. If the Fa, Fb, and Fc fields 
do not all point to the same floating-point register, then it is UNPREDICTABLE which regis- 
ter is used. If the Fa, Fb, and Fc fields do not all point to the same floating-point register, the 
resulting values in the Fc register and in FPCR are UNPREDICTABLE. 


If the Fe field is F31 in the case of MT_FPCR, the resulting value in FPCR is 
UNPREDICTABLE. 


The use of these instructions and the FPCR are described in Section 4.7.8. 
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4.10.5 VAX Floating Add 


Format: 
ADDx ~ Ba.rx,Fb.rx,Fe.wx !Floating-point Operate format 


Operation: 


Fo < Fav + Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 


Instruction mnemonics: 


ADDF Add F_floating 
ADDG Add G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Description: 


Register Fa is added to register Fb, and the sum is written to register Fc. 


The sum is rounded or chopped to the specified precision, and then the corresponding range is 
checked for overflow/underflow. The single-precision operation on canonical single-precision 
values produces a canonical single-precision result. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if 
this occurs. See Section 4.7.7 for details of the stored result on overflow or underflow. 
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4.10.6 IKEE Floating Add 


Format: 
ADDx Fa.rx,Fb.rx,Fc.wx {Floating-point Operate format 


Operation: 
Fo < Fav + Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 
Inexact Result 


Instruction mnemonics: 


ADDS Add S_ floating 
ADDT Add T_floating - 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Inexact Enable (/T) 
Description: 


Register Fa is added to register Fb, and the sum is written to register Fc. 
The sum is rounded to the specified precision and then the corresponding range is checked for 
overflow/underflow. The single-precision operation on canonical single-precision values pro- 


duces a canonical single-precision result. 


See Section 4.7.7 for details of the stored result on overflow, underflow, or inexact result. 
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4.10.7 VAX Floating Compare 


Format: 
CMPGyy Fa.rg,Fb.rg,Fe.wq {Floating-point Operate format 


Operation: 


IF Fav SIGNED RELATION Fbv THEN 
Fc < 4000 0000 0000 00004, 


ELSE 
Fe < 0000 0000 0000 0000,, 


Exceptions: 


Invalid Operation 


Instruction mnemonics: 


CMPGEQ Compare G_floating Equal 

CMPGLE Compare G_floating Less Than or Equal 

CMPGLT Compare G_floating Less Than 
Qualifiers: 

Trapping: Exception Completion (/S) 
Description: 


The two operands in Fa and Fb are compared. If the relationship specified by the qualifier is 
true, a non-zero floating value (0.5) is written to register Fc; otherwise, a true zero is written to 
Fe. 


Comparisons are exact and never overflow or underflow. Three mutually exclusive relations 
are possible: less than, equal, and greater than. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if 
this occurs. : 


Notes: 


¢ Compare Less Than A,B is the same as Compare Greater Than B,A; Compare Less 
Than or Equal A,B is the same as Compare Greater Than or Equal B,A. Therefore, only 
the less-than operations are included. 
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4.10.8 IEEE Floating Compare 


Format: 
CMPTyy Fa.rx,Fb.rx,Fe.wq !Floating-point Operate format 


Operation: 


IF Fav SIGNED RELATION Fbv THEN 
Fc < 4000 0000 0000 0000;, 


ELSE 
Fe <- 0000 0000 0000 0000,, 


Exceptions: 


Invalid Operation 


Instruction mnemonics: 


CMPTEQ Compare T_floating Equal 

CMPTLE Compare T_floating Less Than or Equal 

CMPTLT Compare T_floating Less Than 

CMPTUN - Compare T_floating Unordered 
Qualifiers: 

Trapping: Exception Completion (/SU) 
Description: 


The two operands in Fa and Fb are compared. If the relationship specified by the qualifier is 
true, a non-zero floating value (2.0) is written to register Fc; otherwise, a true zero is written to 
Fc. 


Comparisons are exact and never overflow or underflow. Four mutually exclusive relations are 
possible: less than, equal, greater than, and unordered. The unordered relation is true if one or 
both operands are NaN. (This behavior must be provided by an operating system (OS) comple- 
tion handler, since NaNs trap.) Comparisons ignore the sign of zero, so +0 = -0. 


Comparisons with plus and minus infinity execute normally and do not take an invalid opera- 
tion trap. \This was added to support fast path selection through infinity testing in scientific 
codes.\ 


Notes: 


¢ In order to use CMPTxx with exception completion handling, it is necessary to specify 
the /SU IEEE trap mode, even though an underflow trap is not possible. 


¢ Compare Less Than A,B is the same as Compare Greater Than B,A; Compare Less 
Than or Equal A,B is the same as Compare Greater Than or Equal B,A. Therefore, only 
the less-than operations are included. 
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4.10.9 Convert VAX Floating to Integer 
Format: 
CVTGQ Fb.rx,Fe.wq {Floating-point Operate format 


Operation: 


Fc < {conversion of Fbv} 


Exceptions: 


Invalid Operation 
Integer Overflow 


Instruction mnemonics: 


CVTGQ Convert G_floating to Quadword 
Qualifiers: 

Rounding: ~ Chopped (/C) 

Trapping: Exception Completion (/S) 


Integer Overflow Enable (/V) 


Description: 


The floating operand in register Fb is converted to a two’s-complement quadword number and 
written to register Fc. The conversion aligns the operand fraction with the binary point just to 
the right of bit zero, rounds as specified, and complements the result if negative. Register Fa 
must be F31. 


An invalid operation trap is signaled if the operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if 


this occurs. 


See Section 4.7.7 for details of the stored result on integer overflow. 
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4.10.10 Convert Integer to VAX Floating 


Format: 


CVTQy Fb.rq,Fc.wx {Floating-point Operate format 


Operation: 


Fe < {conversion of Fbv<63:0>} 


Exceptions: 


None 


Instruction mnemonics: 


CVTQE Convert Quadword to F_floating 
CVTQG Convert Quadword to G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Description: 


The two’s-complement quadword operand in register Fb is converted to a single- or dou- 
ble-precision floating result and written to register Fc. The conversion complements a number 
if negative, normalizes it, rounds to the target precision, and packs the result with an appropri- 
ate sign and exponent field. Register Fa must be F31. 
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4.10.11 Convert VAX Floating to VAX Floating 


Format: 
CVTxy Fb.rx,Fc.wx !Floating-point Operate format 


Operation: 


Fe < {conversion of Fbv} 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 


Instruction mnemonics: 


CVTDG Convert D_floating to G_floating 

CVTGD Convert G_floating to D_floating 

CVTGF Convert G_floating to F_floating 
Qualifiers: 

Rounding: Chopped (/C) 

Trapping: Exception Completion (/S) 

Underflow Enable (/U) 

Description: 


The floating operand in register Fb is converted to the specified alternate floating format and 
written to register Fc. Register Fa must be F31. 


An invalid operation trap is signaled if the operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fe are UNPREDICTABLE if 
this occurs. 


See Section 4.7.7 for details of the stored result on overflow or underflow. 


Notes: 


e The only arithmetic operations on D_floating values are conversions to and from 
G_floating. The conversion to G_floating rounds or chops as specified, removing three 
fraction bits. The conversion from G_floating to D_floating adds three low-order zeros 
as fraction bits, then the 8-bit exponent range is checked for overflow/underflow. 


e The conversion from G_floating to F_floating rounds or chops to single precision, then 
the 8-bit exponent range is checked for overflow/underflow. 


e No conversion from F_floating to G_floating is required, since F_floating values are 
always stored in registers as equivalent G_floating values. 
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4.10.12 Convert IEEE Floating to Integer 


Format: 
CVTTQ Fb.rx,Fe.wq !Floating-point Operate format 


Operation: 


Fe < {conversion of Fbv} 


Exceptions: 


Invalid Operation 
Inexact Result 
Integer Overflow 


Instruction mnemonics: 


CVTTQ Convert T_floating to Quadword 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Exception Completion (/S) 
Integer Overflow Enable (/V) 
Inexact Enable (/I) 
Description: 


The floating operand in register Fb is converted to a two’s-complement number and written to 
register Fc. The conversion aligns the operand fraction with the binary point just to the right of 
bit zero, rounds as specified, and complements the result if negative. Register Fa must be F31. 


See Section 4.7.7 for details of the stored result on integer overflow and inexact result. 
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4.10.13 Convert Integer to IEEE Floating 


Format: 
CVTQy Fb.rg,Fc.wx {Floating-point Operate format 


Operation: 


Fo < {conversion of Fbv<63:0>} 


Exceptions: — 


Inexact Result 


Instruction mnemonics: 


CVTQS Convert Quadword to S_floating 
CVTQT Convert Quadword to T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Exception Completion (/S) 
Inexact Enable (/I) 
Description: 


The two’s-complement operand in register Fb is converted to a single- or double-precision 
floating result and written to register Fc. The conversion complements a number if negative, 
normalizes it, rounds to the target precision, and packs the result with an appropriate sign and 
exponent field. Register Fa must be F31. 


See Section 4.7.7 for details of the stored result on inexact result. 


Notes: 


¢ In order to use CVTQS or CVTQT with exception completion handling, it is necessary 
to specify the /SUI IEEE trap mode, even though an underflow trap is not possible. — 
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4.10.14 Convert IEEE S_Floating to IEEE T_Floating 


Format: 
CVTST Fb.rx,Fc.wx ! Floating-point Operate format 


Operation: 


Fc < {conversion of Fbv} 


Exceptions: 


Invalid Operation 


Instruction mnemonics: 


CVTST Convert S_floating to T_floating 
Qualifiers: 

Trapping: Exception Completion (/S) 
Description: 


The S_floating operand in register Fb is converted to T_floating format and written to register 
Fc. Register Fa must be F31. 


Notes: 


e The conversion from S_floating to T_floating is exact. No rounding occurs. No under- 
flow, overflow, or inexact result can occur. In fact, the conversion for finite values is the 
identity transformation. 


e A trap handler can convert an S_floating denormal value into the corresponding 
T_floating finite value by adding 896 to the exponent and normalizing. 
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4.10.15 Convert IEEE T_Floating to IEEE S_Floating 


Format: 


- CVTTS Fb.rx,Fc.wx !Floating-point Operate format 


Operation: 


Fe <— {conversion of Fbv} 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 
Inexact Result 


Instruction mnemonics: 


CVTTS Convert T_floating to S_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Inexact Enable (/I) 
Description: 


The T_floating operand in register Fb is converted to S_floating format and written to register 
Fc. Register Fa must be F31. 


See Section 4.7.7 for details of the stored result on overflow, underflow, or inexact result. 
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4.10.16 VAX Floating Divide 


Format: 
DIVx Fa.rx,Fb.rx,Fo.wx !Floating-point Operate format 


Operation: 
Fo < Fav / Fbv 


Exceptions: 
Invalid Operation 
Division by Zero 
Overflow 
Underflow 


Instruction mnemonics: 


DIVF Divide F_floating 
DIVG Divide G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Description: 


The dividend operand in register Fa is divided by the divisor operand in register Fb and the 
quotient is written to register Fc. 


The quotient is rounded or chopped to the specified precision and then the corresponding 
range is checked for overflow/underflow. The single-precision operation on canonical sin- 
gle-precision values produces a canonical single-precision result. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if 


this occurs. 


A division by zero trap is signaled if Fbv is zero. The contents of Fc are UNPREDICTABLE 
if this occurs. 


See Section 4.7.7 for details of the stored result on overflow or underflow. 
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4.10.17 IEEE Floating Divide 


Format: 
DIVx Fa.rx,Fb.rx,Fco.wx {Floating-point Operate format 


Operation: 
Fe < Fav / Fbv 


Exceptions: 
Invalid Operation 
Division by Zero 
Overflow 
Underflow 
Inexact Result 


Instruction mnemonics: 


DIVS Divide S_floating 
DIVT Divide T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
. Minus infinity (/M) 

Chopped (/C) 

Trapping: Exception Completion (/S) 
Underflow Enable (/U) 


Tnexact Enable (/T) 


Description: 


The dividend operand in register Fa is divided by the divisor operand in register Fb and the 
quotient is written to register Fc. 


The quotient is rounded to the specified precision and then the corresponding range is checked 
for overflow/underflow. The single-precision operation on canonical single-precision values 


produces a canonical single-precision result. 


See Section 4:7.7 for details of the stored result on overflow, underflow, or inexact result. 
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4.10.18 Floating-Point Register to Integer Register Move 


Format: 


FTOIx Fa.rq,Rc.wq !Floating-point Operate format 


Operation: 
CASE: 
FTOIS: 
Rc<63:32> < SEXT(Fav<63>) 
Re<31:0> < Fav<63:62> || Fav <58:29> 
FTOIT: 
Re <- Fav 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


FTOIS Floating-point to Integer Register Move, S_floating 
FTOIT Floating-point to Integer Register Move, T_floating 
Qualifiers: 
None 
Description: 


Data in a floating-point register file is moved to an integer register file. 
The Fb field must be F31. 


The instructions do not interpret bits in the register files; specifically, the instructions do not 
trap on non-finite values. Also, the instructions do not access memory. 


FTOIS is exactly equivalent to the sequence: — 
sts 
LDL 


FTOIT is exactly equivalent to the sequence: 
STT 
LDQ 
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Software Note: 


FTOIS and FTOIT are no slower than the corresponding store/load sequence and can be 
significantly faster. ; 


\Implementation Note: 


EV4 and EVS processors and their derivatives shall provide operating system emulation code 
for these instructions.\ 
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4.10.19 Integer Register to Floating-Point Register Move 


Format: 


ITOFx Ra.rg,Fc.wq !Floating-point Operate format 


Operation: 


CASE: 
ITOFF: 
Fe < Rav<31> || MAP _F(Rav<30:23> || Rav<22:0> || 0<28:0> 
ITOFS : 
Fe < Rav<31> || MAP S(Rav<30:23> || Rav<22:0> || 0<28:0> 
ITOFT: 
Fc <- Rav 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


ITOFF Integer to Floating-point Register Move, F_floating 
ITOFS Integer to Floating-point Register Move, S_ floating 
ITOFT Integer to Floating-point Register Move, T_floating 
Qualifiers: 
None 
Description: 


Data in an integer register file is moved to a floating-point register file. 
The Rb field must be R31. 


The instructions do not interpret bits in the register files; specifically, the instructions do not 
trap on non-finite values. Also, the instructions do not access memory. 


ITOFF is equivalent to the following sequence, except that the word swapping that LDF nor- 
mally performs is not performed by ITOFF: 


STL 
LDF 


DIGITAL Restricted Distribution 


Instruction Descriptions (1) 4-125 


ITOFS is exactly equivalent to the sequence: 


STL 
LDS 
ITOFT is exactly equivalent to the sequence: 


STQ 
LDT 


Software Note: 


ITOFF, ITOFS, and ITOFT are no slower than the corresponding store/load sequence and 
can be significantly faster. 


\Implementation Note: 


EV4 and EVS processors and their derivatives shall provide operating system emulation code 
for these instructions.\ : 
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4.10.20 VAX Floating Multiply 


Format: 
MULx Fa.rx,Fb.rx,Fc.wx {Floating-point Operate format 


Operation: 


Fe < Fav * Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 


Instruction mnemonics: 


MULF Multiply F_floating 
MULG Multiply G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Description: 


The multiplicand operand in register Fb is multiplied by the multiplier operand in register Fa 
and the product is written to register Fc. 


The product is rounded or chopped to the specified precision and then the corresponding range 
is checked for overflow/underflow. The single-precision operation on canonical single-preci- 
sion values produces a canonical single-precision result. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE if 


this occurs. 


See Section 4.7.7 for details of the stored result on overflow or underflow. 
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4.10.21 IEEE Floating Multiply 


Format: 
MULx Fa.rx,Fb.rx,Fc.wx !Floating-point Operate format 


Operation: 
Fe < Fav * Fbv 


Exceptions: 


Invalid Operation 
Overflow 

' Underflow 
Inexact Result 


Instruction mnemonics: 


MULS Multiply S_floating 
MULT Multiply T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Inexact Enable (/1) 
Description: 


The multiplicand operand in register Fb is multiplied by the multiplier operand in register Fa 
and the product is written to register Fc. 


The product is rounded to the specified precision and then the corresponding range is checked 
for overflow/underflow. The single-precision operation on canonical single-precision values 


produces a canonical single-precision result. 


See Section 4.7.7 for details of the stored result on overflow, underflow, or inexact result. 
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4.10.22 VAX Floating Square Root 


Format: 
SQRTx Fb.rx,Fc.wx {Floating-point Operate format 


Operation: 
Fo < Fb ** (1/2) 


Exceptions: 


Invalid operation 


Instruction mnemonics: 


SQRTF Square root F_floating 

SQRTG Square root G_floating 
Qualifiers: 

Rounding: Chopped (/C) 

Trapping: Exception Completion (/S) 


Underflow Enable (/U) — See Notes below 


Description: 
The square root of the floating-point operand in register Fb is written to register Fc. (The Fa 
field of this instruction must be set to a value of F31.) 


The result is rounded or chopped to the specified precision. The single-precision operation on 
a canonical single-precision value produces a canonical single-precision result. 


An invalid operation is signaled if the operand has exp=0 and is not a true zero (that is, VAX 
reserved operands and dirty zeros trap). An invalid operation is signaled if the sign of the oper- 
and is negative. 


The contents of the Fc are UNPREDICTABLE if an invalid operation is signaled. 


Notes: 


e Floating-point overflow and underflow are not possible for square root operation. The 
underflow enable qualifier is ignored. 


DIGITAL Restricted Distribution 


Instruction Descriptions (1) 4-129 


4.10.23 IEEE Floating Square Root 


Format: 
SQRTx Fb.rx,Fe.wx !Floating-point Operate format 


Operation: 
Fe < Fb ** (1/2) 


Exceptions: 


Inexact result 
Invalid operation 


Instruction mnemonics: 


SQRTS Square root S_floating 
SQRTT Square root T_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Dynamic (/D) 
Minus infinity (/M) 
Trapping: Inexact Enable (/I) 


Exception Completion (/S) 
Underflow Enable (/U) — See Notes below 


Description: 
The square root of the floating-point operand in register Fb is written to register Fc. (The Fa 
field of this instruction must be set to a value of F31.) 


The result is rounded to the specified precision. The single-precision operation on a canonical 
single-precision value produces a canonical single-precision result. 


An invalid operation is signaled if the sign of the operand is less than zero. However, SQRT 
(-O) produces a result of —0. 


Notes: 


¢ Floating-point overflow and underflow are not possible for square root operation. The 
underflow enable qualifier is ignored. 
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4.10.24 VAX Floating Subtract 


Format: 
SUBx | Fa.rx,Fb.rx,Fce.wx !Floating-point Operate format 


Operation: 


Fe < Fav —- Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 


Instruction mnemonics: 


SUBF Subtract F_floating 
SUBG Subtract G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Description: 


The subtrahend operand in register Fb is subtracted from the minuend operand in register Fa 
and the difference is written to register Fc. 


The difference is rounded or chopped to the specified precision and then the corresponding 
range is checked for overflow/underflow. The single-precision operation on canonical sin- 
gle-precision values produces a canonical single-precision result. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true zero (that is, 
VAX reserved operands and dirty zeros trap). The contents of Fe are UNPREDICTABLE if 


this occurs. 


See Section 4.7.7 for details of the stored result on overflow or underflow. 
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4.10.25 IEEE Floating Subtract 


Format: 
SUBx Fa.rx,Fb.rx,Fc.wx !Floating-point Operate format 


Operation: 
Fe <¢ Fav - Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 
Inexact Result 


Instruction mnemonics: 


SUBS Subtract S_floating 
SUBT Subtract T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Exception Completion (/S) 
Underflow Enable (/U) 
Inexact Enable (/I) 
Description: 


The subtrahend operand in register Fb is subtracted from the minuend operand in register Fa 
and the difference is written to register Fc. 


The difference is rounded to the specified precision and then the corresponding range is 
checked for overflow/underflow. The single-precision operation on canonical single-precision 


values produces a canonical single-precision result. 


See Section 4.7.7 for details of the stored result on overflow, underflow, or inexact result. 
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4.11 Miscellaneous Instructions 
Alpha provides the miscellaneous instructions shown in Table 4-17. 


Table 4-17: Miscellaneous Instructions Summary 


Mnemonic Operation 

AMASK _ Architecture Mask 
CALL_PAL Call Privileged Architecture Library Routine 
ECB Evict Cache Block 

EXCB Exception Barrier 

FETCH Prefetch Data 

FETCH_M Prefetch Data, Modify Intent 
IMPLVER Implementation Version 

MB Memory Barrier 

RPCC Read Processor Cycle Counter 
TRAPB Trap Barrier 

WH64 Write Hint — 64 Bytes 

WMB Write Memory Barrier 
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4.11.1 Architecture Mask 


Format: 
AMASK Rb.rq,Re.wq !Operate format 
AMASK #b.ib,Rc.wq 'Operate format 
Operation: 


Rc < Rbv AND {NOT CPU_feature_mask} 


Exceptions: 


None 


Instruction mnemonics: 


AMASK ‘Architecture Mask 


Qualifiers: 


None 


Description: 


Rbv represents a mask of the requested architectural extensions. Bits are cleared that corre- 
spond to architectural extensions that are present. Reserved bits and bits that correspond to 
absent extensions are copied unchanged. In either case, the result is placed in Rc. If the result 
is zero, all requested features are present. 


Software may specify an Rbv of all 1’s to determine the complete set of architectural exten- 
sions implemented by a processor. Assigned bit definitions are located in Appendix D. 


Ra must be R31 or the result in Rc is UNPREDICTABLE and it is UNPREDICTABLE 
whether an exception is signaled. 


Software Note: 


Use this instruction to make instruction-set decisions; use IMPLVER to make code-tuning 
decisions. 


Implementation Note: 
Instruction encoding is implemented as follows: 


© On 21064/21064A/21066/21068/21066A (EV4/EV45/LCA/LCA45 chips), AMASK 
‘copies Rbv to Rc. 


© On 21164 (EV5), AMASK copies Rbv to Re. 
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e On 21164A (EV56), 21164PC (PCA56), and 21264 (EV6), AMASK correctly indicates 
support for architecture extensions by copying Rbv to Re and clearing appropriate bits. 


Bits are assigned and placed in Appendix D for architecture extensions as ECOs for those 
extensions are passed. The low 8 bits are reserved for standard architecture extensions so 
they can be tested with a literal; application-specific extensions are assigned from bit 8 
upward. 
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4.11.2 Call Privileged Architecture Library 


Format: 
CALL_PAL fnc.ir !PAL format 


Operation: 
{Stall instruction issuing until all 
prior instructions are guaranteed to. 
complete without incurring exceptions.} 
{Trap to PALcode.} 

Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL Call Privileged Architecture Library 


Qualifiers: 


None 


Description: 


The CALL_PAL instruction is not issued until all previous instructions are guaranteed to com- 
plete without exceptions. If an exception occurs, the continuation PC in the exception stack 
frame points to the CALL_PAL instruction. The CALL_PAL instruction causes a trap to 
PALcode. 
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4.11.3 Evict Data Cache Block 


Format: 
ECB (Rb.ab) ! Misc format 


Operation: 


va <— Rbv 


IF { va maps to memory space } THEN 

Prepare to reuse cache resources that are occupied by the 
the addressed byte. 

END 


Exceptions: 


None 


Instruction mnemonics: 


ECB Evict Cache Block 


Qualifiers: 


None 


Description: 


The ECB instruction provides a hint that the addressed location will not be referenced again in 
the near future, so any cache space it occupies should be made available to cache other mem- 
ory locations. If the cache copy of the location is dirty, the processor may start writing it back; 
if the cache has multiple sets, the processor may arrange for the set containing the addressed 
byte to be the next set allocated. 


The ECB instruction does not generate exceptions; if it encounters data address translation 
errors (access violation, translation not valid, and so forth) during execution, it is treated as a 
NOP. 


If the address maps to non-memory-like (I/O) space, ECB is treated as a NOP. 


Software Note: 


© ECB makes a particular cache location available for reuse by evicting and invalidating 
its contents. The intent is to give software more control over cache allocation policy in 
set-associative caches so that "useful" blocks can be retained in the cache. 


e ECB is a performance hint — it does not serialize the eviction of the addressed cache 
block with any preceding or following memory operation. : 
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e ECB is not intended for flushing caches prior to power failure or low power operation 
— CFLUSH is intended for that purpose. 


Implementation Note: 


Implementations with set-associative caches are encouraged to update their allocation 
pointer so that the next D-stream reference that misses the cache and maps to this line is 
allocated into the vacated set. 
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4.11.4 Exception Barrier 


Format: 


EXCB ! Memory format 


Operation: 
{EXCB does not appear to issue until completion of all 
exceptions and dependencies on the Floating-point Control 
Register (FPCR) from prior instructions .} 


Exceptions: 


None 


Instruction mnemonics: 


EXCB Exception Barrier 


Qualifiers: 


None 


Description: 


The EXCB instruction allows software to guarantee that in a pipelined implementation, all pre- 
vious instructions have completed any behavior related to exceptions or rounding modes 
before any instructions after the EXCB are issued. 


In particular, all changes to the Floating-point Control Register (FPCR) are guaranteed to have 
been made, whether or not there is an associated exception. Also, all potential floating-point 
exceptions and integer overflow exceptions are guaranteed to have been taken. EXCB is thus a 
superset of TRAPB. 


If a floating-point exception occurs for which trapping is enabled, the EXCB instruction acts 
like a fault. In this case, the value of the Program Counter reported to the program may be the 
address of the EXCB instruction (or earlier) but is never the address of an instruction follow- 
ing the EXCB. 


The relationship between EXCB and the FPCR is described in Section 4.7.8.1. 
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4.11.5 Prefetch Data 


Format: 
FETCHx O(Rb.ab) !Memory format 


Operation: 


va < {Rbv} 
{Optionally prefetch aligned 512-byte block surrounding va.} 


Exceptions: 


None 


Instruction mnemonics: 


FETCH Prefetch Data 

FETCH_M Prefetch Data, Modify Intent 
Qualifiers: 

None 
Description: 


The virtual address is given by Rbv. This address is used to designate an aligned 512-byte 
block of data. An implementation may optionally attempt to move all or part of this block (or 
a larger surrounding block) of data to a part of the memory hierarchy that has faster-access, in 
anticipation of subsequent Load or Store instructions that access that data. 


Implementation Note: 


FETCHx is intended to help software overlap memory latencies when such latencies are 
on the order of at least 100 cycles. FETCHx is unlikely to help (or be implemented) for 
significantly shorter memory latencies. Code scheduling and cache-line prefetching (See 
Appendix A.3.5) should be used to overlap such shorter latencies. 


Existing Alpha implementations (through the 21264) have memory latencies that are too 
short to profitably implement FETCHx. Therefore, FETCHx does not improve memory 
performance in existing Alpha implementations. 


The FETCH instruction is a hint to the implementation that may allow faster execution. An 
implementation is free to ignore the hint. If prefetching is done in an implementation, the 
order of fetch within the designated block is UNPREDICTABLE. 


The FETCH_M instruction gives the additional hint that modifications (stores) to some or all 
of the data block are anticipated. 
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No exceptions are generated by FETCH«x. If a Load (or Store in the case of FETCH_M) that 
uses the same address would fault, the prefetch request is ignored. It is UNPREDICTABLE 
whether a TB-miss fault is ever taken by FETCHx. 


Implementation Note: 


Implementations are encouraged to take the TB-miss fault, then continue the prefetch. 


DIGITAL Restricted Distribution 


Instruction Descriptions (I) 4-141 


4.11.6 Implementation Version 


Format: 
IMPLVER Rec !Operate format 


Operation: 
Re <- value, which is defined in Appendix D 


Exceptions: 


None 


Instruction mnemonics: 


IMPLVER Implementation Version 


Description: 


A small integer is placed in Rc that specifies the major implementation version of the proces- 
sor on which it is executed. This information can be used to make code-scheduling or tuning 
decisions, or the information can be used to branch to different pieces of code optimized for 
different implementations. 


Notes: 


¢ The value returned by IMPLVER does not identify the particular processor type. 
Rather, it identifies a group of processors that can be treated similarly for performance 
characteristics such as scheduling. Ra must be R31 and Rb must be the literal #1 or the 
result in Rc is UNPREDICTABLE and it is UNPREDICTABLE whether an exception 
is signaled. 


Software Note: 


Use this instruction to make code-tuning decisions; use AMASK to make instruction-set 
decisions. 
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4.11.7 Memory Barrier 
Format: 


MB !Memory format 


Operation: 


{Guarantee that all subsequent loads or stores 
will not access memory until after all previous 
loads and stores have accessed memory, as 
observed by other processors. } 


Exceptions: 


None 


Instruction mnemonics: 


MB Memory Barrier 


Qualifiers: 


None 


Description: 


The use of the Memory Barrier (MB) instruction is required only in multiprocessor systems. 


In the absence of an MB instruction, loads and stores to different physical locations are 
allowed to complete out of order on the issuing processor as observed by other processors. 
The MB instruction allows memory accesses to be serialized on the issuing processor as 
observed by other processors. See Chapter 5 for details on using the MB instruction to serial- 
ize these accesses. Chapter 5 also details coordinating memory accesses across processors. 


Note that MB ensures serialization only; it does not necessarily accelerate the progress of 
memory operations. 
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4.11.8 Read Processor Cycle Counter 


Format: 
RPCC Ra.wq !Memory format 


Operation: 


Ra < {cycle counter} 


Exceptions: 


None 


Instruction mnemonics: 


RPCC Read Processor Cycle Counter 


Qualifiers: 


None 


Description: 


Register Ra is written with the processor cycle counter (PCC). The PCC register consists of 
two 32-bit fields. The low-order 32 bits (PCC<31:0>) are an unsigned, wrapping counter, 
PCC_CNT. The high-order 32 bits (PCC<63:32>), PCC_OFF, are operating-system depen- 
dent in their implementation. 


See Section 3.1.5 for a description of the PCC. 


If an operating system uses PCC_OFF to calculate the per-process or per-thread cycle count, 
that count must be derived from the 32-bit sum of PCC_OFF and PCC_CNT. The following 
example computes that cycle count, modulo 2**32, and returns the count value in RO. Notice 
the care taken not to cause an unwanted sign extension. 


RPCC RO ; Read the process cycle counter 

SLL RO, #32, R1 ; Line up the offset and count fields 
ADDQ RO, R1, RO ; Do add | 

SRL RO, #32, RO : Zero extend the count to 64 bits 


The following example code returns the value of PCC_CNT in R0O<31:0> and all zeros in 
RO<63:32>. 


RPCC RO 
ZAPNOT RO, #15,RO 
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\Internal Implementation Note: 


An implementation-dependent mechanism must exist that, when enabled, causes the RPCC 


instruction always to return a zero in Ra. This mechanism must be usable by privileged system 
software.\ 
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4.11.9 Trap Barrier 


Format: 
TRAPB !Memory format 


Operation: 


{TRAPB does not appear to issue until all prior instructions 
are guaranteed to complete without causing any arithmetic traps}. 


Exceptions: 


None 


Instruction mnemonics: 


TRAPB Trap Barrier 


Qualifiers: 


None 


Description: 


The TRAPB instruction allows software to guarantee that in a pipelined implementation, all 
previous arithmetic instructions will complete without incurring any arithmetic traps before 
the TRAPB or any instructions after it are issued. 


If an arithmetic exception occurs for which trapping is enabled, the TRAPB instruction acts — 
like a fault. In this case, the value of the Program Counter reported to the program may be the 
address of the TRAPB instruction (or earlier) but is never the address of the instruction follow- 
ing the TRAPB. 


This fault behavior by TRAPB allows software, using one TRAPB instruction for each excep- 
tion domain, to isolate the address range in which an exception occurs. If the address of the 
instruction following the TRAPB were allowed, there would be no way to distinguish an 
exception in the address range preceding a label from an exception in the range that includes 
the label along with the faulting instruction and a branch back to the label. This case arises 
when the code is not following exception completion rules but is inserting TRAPB instruc- 
tions to isolate exceptions to the proper scope. 


Use of TRAPB should be compared with use of the EXCB instruction; see Section 4.11.4. 
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4.11.10 Write Hint 


Format: 
WH64 (Rb.ab) ! Misc format 


Operation: 


va < Rbv 

IF { va maps to memory space } THEN 

Write UNPREDICTABLE data to the aligned 64-byte region 
containing the addressed byte. 

END 


Exceptions: 


None 


Instruction mnemonics: 


WH64 Write Hint - 64 Bytes 


Qualifiers: 


None 


Description: 


The WH64 instruction provides a hint that the current contents of the aligned 64-byte block 
containing the addressed byte will never be read again but will be overwritten in the near 
future. 


The processor may allocate cache resources to hold the block without reading its previous con- 
tents from memory; the contents of the block may be set to any value that does not introduce a 
security hole, as described in Section 1.6.3. 


The WH64 instruction does not generate exceptions; if it encounters data address translation 
errors (access violation, translation not valid, and so forth), it is treated as a NOP. 


If the address maps to non-memory-like (I/O) space, WH64 is treated as a NOP. 


Software Note: 


This instruction is a performance hint that should be used when writing a large continuous 
region of memory. The intended code sequence consists of one WH64 instruction 
followed by eight quadword stores for each aligned 64-byte region to be written. 


Sometimes, the UNPREDICTABLE data will exactly match some or all of the previous 
contents of the addressed block of memory. \On EV4 and EV5, it always matches for the 
entire block.\ 
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Implementation Note: 
If the 64-byte region containing the addressed byte is not in the data cache, 
implementations are encouraged to allocate the region in the data cache without first 
reading it from memory. However, if any of the addressed bytes exist in the caches of 
other processors, they must be kept coherent with respect to those processors. 


Processors with cache blocks smaller than 64 bytes are encouraged to implement WH64 
as defined. However, they may instead implement the instruction by allocating a smaller 
aligned cache block for write access or by treating WH64 as a NOP. 


Processors with cache blocks larger than 64 bytes are also encouraged to implement 
WH64 as defined. However, they may instead treat WH64 as a NOP. 
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4.11.11 Write Memory Barrier 


Format: 


WMB !Memory format 


Operation: 


{ Guarantee that 

{ All preceding stores that access memory-like 

{ regions are ordered before any subsequent stores 
{ that access memory-like regions and 

{ All preceding stores that access non-memory—like 

{ regions are ordered before any subsequent stores 
{ that access non-memory-like regions. 


Exceptions: 


None 


Instruction mnemonics: 


WMB Write Memory Barrier 


Qualifiers: 


None 


Description: 


The WMB instruction provides a way for software to control write buffers. It guarantees that 
writes preceding the WMB are not aggregated with writes that follow the WMB. 


WMB guarantees that writes to memory-like regions that precede the WMB are ordered 
before writes to memory-like regions that follow the WMB. Similarly, WMB guarantees that 
writes to non-memory-like regions that precede the WMB are ordered before writes to 
non-memory-like regions that follow the WMB. It does not order writes to memory-like 
regions relative to writes to non-memory-like regions. 


WMB causes writes that are contained in buffers to be completed without unnecessary delay. 
It is particularly suited for batching writes to high-performance I/O devices. 


WMB prevents writes that precede the WMB from being merged with writes that follow the 
WMB. In particular, two writes that access the same location and are separated by a WMB 
cause two distinct and ordered write events. 


In the absence of a WMB (or IMB or MB) instruction, stores to memory-like or non-mem- 
ory-like regions can be aggregated and/or buffered and completed in any order. 
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The WMB instruction is the preferred method for providing high-bandwidth write streams 
where order must be preserved between writes in that stream. 


\Implementation Notes: 


Hardware designers should provide implementations of the WMB instruction that do not stall but 
that continue issuing subsequent instructions, including loads and stores. Hardware designs 
should complete all writes contained in buffers at the time a WMB instruction is executed with- 
out unnecessary delay.\ 


Notes: 


WMB is useful for ordering streams of writes to a non-memory-like region, such as to mem- 
ory-mapped control registers or to a graphics frame buffer. While both MB and WMB can 
ensure that writes to a non-memory-like region occur in order, without being aggregated or 
reordered, the WMB is usually faster and is never slower than MB. 


WMB can correctly order streams of writes in programs that operate on shared sections of data 
if the data in those sections are protected by a classic semaphore protocol. The following 
example illustrates such a protocol: 


Processor i Processor j 


<Acquire lock> 
MB 


<Read and write data 
in shared section> 


WMB 
<Release lock> =  <Acquire lock> 
MB 
<Read and write data in shared section> 


WMB 


The example above is similar to that in Section 5.5.4, except a WMB is substituted for the sec- 
ond MB in the lock-update-release sequence. It is correct to substitute WMB for the second 
MB only if: 


1. All data locations that are read or written in the critical section are accessed only after 
acquiring a software lock by using lock_variable (and before releasing the software 
lock). 


2. For each read u of shared data in the critical section, there is a write v such that: 
a. vis BEFORE the WMB 
b. v follows u in processor issue sequence (see Section 5.6.1.1) 


c. veither depends on u (see Section 5.6.1.7) or overlaps u (see Section 5.6.1), or 
both. 
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3. Both lock_variable and all the shared data are in memory-like regions (or lock_variable 
and all the shared data are in non-memory-like regions). If the lock_variable is in a 
non-memory-like region, the atomic lock protocol must use some implementation-spe- 
cific hardware support. 


The substitution of a WMB for the second MB is usually faster and never slower. 
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4.12 VAX Compatibility Instructions 


Alpha provides the instructions shown in Table 4—18 for use in translated VAX code. These 
instructions are not a permanent part of the architecture and will not be available in some 
future implementations. They are intended to preserve customer assumptions about VAX 
instruction atomicity in porting code from VAX to Alpha. 


These instructions should be generated only by the VAX-to-Alpha software translator; they 
should never be used in native Alpha code. Any native code that uses them may cease to work. 


Table 4-18: VAX Compatibility Instructions Summary 


Mnemonic Operation 
RC . Read and Clear 


RS Read and Set 
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4.12.1 VAX Compatibility Instructions 


Format: 


Rx Ra.wq !Memory format 


Operation: 
Ra < intr flag 
intr_flag < 0 {RC 
intr_flag < 1 IRS 
Exceptions: 


None 


Instruction mnemonics: 


RC Read and Clear 
RS Read and Set 
Qualifiers: 
None 
Description: 


The intr_flag is returned in Ra and then cleared to zero (RC) or set to one (RS). 


These instructions may be used to determine whether the sequence of Alpha instructions 
between RS and RC (corresponding to a single VAX instruction) was executed without inter- 
ruption or exception. 


Intr_flag is a per-processor state bit. The intr_flag is cleared if that processor encounters a 
CALL_PAL REI instruction. 


It is UNPREDICTABLE whether a processor’s intr_flag is affected when that processor exe- 
cutes an LDx_L or STx_C instruction. A processor’s intr_flag is not affected when that 
processor executes a normal load or store instruction. 


A processor’s intr_flag is not affected when that processor executes a taken branch. 


Notes: 


e These instructions are intended only for use by the VAX-to-Alpha software translator; 
they should never be used by native code. 
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4.13 Multimedia (Graphics and Video) Support 


Alpha provides the following instructions that enhance support for graphics and video 


algorithms: 
Mnemonic Operation 
MINUB8 Vector Unsigned Byte Minimum 
MINSB8 Vector Signed Byte Minimum 
MINUW4 Vector Unsigned Word Minimum 
MINSW4 Vector Signed Word Minimum 
MAXUB8 Vector Unsigned Byte Maximum 
MAXSB8 Vector Signed Byte Maximum 
MAXUW4 Vector Unsigned Word Maximum 
MAXSW4 Vector Signed Word Maximum 
PERR Pixel Error ~ 
PKLB Pack Longwords to Bytes 
PKWB Pack Words to Bytes 
UNPKBL Unpack Bytes to Longwords 
UNPKBW Unpack Bytes to Words 


The MIN and MAX instructions allow the clamping of pixel values to maximium values that 
are allowed in different standards and stages of the CODECs. 


The PERR instruction accelerates the macroblock search in motion estimation. 


The pack and unpack (PKxB and UNPKBxy) instructions accelerate the blocking of interleaved 
YUV coordinates for processing by the CODEC. 


Implementation Note: 


Alpha processors for which the AMASK instruction returns bit 8 set implement these 
instructions. Those processors for which AMASK does not return bit 8 set can take an 
Illegal Instruction trap, and software can emulate their function, if required. 


\Processors subsequent to EV6 should be allowed to make implementation optional; 
therefore, these instructions should be subsetted after EV6.\ 
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4.13.1 Byte and Word Minimum and Maximum 


Format: 


MINxxx Ra.rq,Rb.rgq,Rc.wq ! Operate Format 


MAXxxx Ra.rq,#b.ib,Rc.wq ! Operate Format 


Operation: 


CASE 
MINUB8 : 
FOR i FROM 0 TO 7 
Rev<i*8+7:i*8> = MINU(Rav<i*8+7:i*8>,Rbv<i*8+7:1*8>) 
END 
MINSB8: 
FOR i FROM 0 TO 7 
Rev<i*8+7:i*8> = MINS(Rav<i*8+7:i*8>,Rbv<i*8+7:1*8>) 
END 
MINUW4 : 
FOR i FROM 0 TO 3 
Rev<i*16+15:i*16> = MINU(Rav<i*16+15:i*16>,Rbv<i*16+15:i*16>) 
END 
MINSW4 : 
FOR i FROM 0 TO 3 
Rev<i*16+15:1*16> 
END 
MAXUB8B : 
FOR i FROM 0 TO 7 
Reov<i*8+7:i*8> = MAXU(Rav<i*8+7:i*8>,Rbv<i*8+7:i*8>) 
END 
MAXSB8: 
FOR i FROM 0 TO 7 
Rev<i*8+7:i*8> = MAXS(Rav<i*8+7:i*8>,Rbv<i*8+7:i*8>) 
END 
MAXUW4 : 
FOR i FROM 0 TO 3 
Rev<i*16+15:1i*16> 
END 
MAXSW4 : 
FOR i FROM 0 TO 3 
Rev<i*16+15:i*16> 
END 
ENDCASE : 


MINS (Rav<i*16+15:i*16>,Rbv<i*16+15:i*16>) 


MAXU (Rav<i*16+15 :i*16>,Rbv<i*16+15:i*16>) 


MAXS (Rav<i*16+15 :i*16>,Rbov<i*16+15:i*16>) 


Exceptions: 


None 
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Instruction mnemonics: 


MINUB8 Vector Unsigned Byte Minimum 
MINSB8 Vector Signed Byte Minimum 
MINUW4 Vector Unsigned Word Minimum 
MINSW4 Vector Signed Word Minimum 
MAXUB8 Vector Unsigned Byte Maximum 
MAXSB8 Vector Signed Byte Maximum 
MAXUW4 Vector Unsigned Word Maximum 
MAXSW4 Vector Signed Word Maximum 
Qualifiers: 
None 
Description: 


For MINxB8, each byte of Rc is written with the smaller of the corresponding bytes of Ra or 
Rb. The bytes may be interpreted as signed or unsigned values. 


For MINxW4, each word of Rc is written with the smaller of the corresponding words of Ra 
or Rb. The words may be interpreted as signed or unsigned values. 


For MAXxB8, each byte of Rc is written with the larger of the corresponding bytes of Ra or 
Rb. The bytes may be interpreted as signed or unsigned values. 


For MAXxW4, each word of Rc is written with the larger of the corresponding words of Ra or 
Rb. The words may be interpreted as signed or unsigned values. 
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4.13.2 Pixel Error 


Format: 


PERR Ra.rq,Rb.rg,Rc.wq . ! Operate Format 


Operation: 
temp = 0 
FOR i FROM 0 TO 7 
IF { Rav<i*8+7:i*8> GEU Rbv<i*8+7:i*8>} THEN 
temp < temp + (Rav<i*8+7:i*8> -— Rbv<i*8+7:i*8>) 
ELSE 
temp < temp + (Rbv<i*8+7:i*8> -— Rav<i*8+7:i*8>) 
END 
Rc < temp 


Exceptions: 


None 


Instruction mnemonics: 


PERR Pixel Error 


Qualifiers: 


None 


Description: 


The absolute value of the difference between each of the bytes in Ra and Rb is calculated. The 
sum of the resulting bytes is written to Rc. 
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4.13.3 Pack Bytes 


Format: 
PKxB | Rb.rq,Re.wq ! Operate Format 


Operation: 


CASE 
PKLB: 
BEGIN 
Rce<07:00> <— Rbv<07:00> 
Re<15:08> <— Rbv<39:32> 
Rce<63:16> < 0 
END 
PKWB: 
BEGIN 
Rc<07:00> 
Rce<15:08> 
Rc<23:16> 
Rce<31:24> 
Rc<63:32> 
END 
ENDCASE 


Rbv<07:00> 
Rbv<23:16> 
Rbv<39 3 32> 
Rbv<55 3 48> 
0 


TA ok 


Exceptions: 


None 


Instruction mnemonics: 


PKLB Pack Longwords to Bytes 
PKWB Pack Words to Bytes 
Qualifiers: 
None 
Description: 


For PKLB, the component longwords of Rb are truncated to bytes and written to the lower two 
byte positions of Rc. The upper six bytes of Rc are written with zero. 


For PKWB, the component words of Rb are truncated to bytes and written to the lower four 
byte positions of Rc. The upper four bytes of Rc are written with zero. 
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4.13.4 Unpack Bytes 


Format: 
UNPKBx Rb.rq,Rc.wq ! Operate Format 


Operation: 


temp = 0 
CASE 
UNPKBL : 
BEGIN 
temp<07:00> = Rbv<07:00> 
temp<39:32> = Rbv<15:08> 
END 
UNPKBW: 
BEGIN 
temp<07:00> = Rbv<07:00> 
temp<23:16> = Rbv<15:08> 
temp<39:32> = Rbv<23:16> 
temp<55:48> = Rbv<31:24> 
END 
ENDCASE 
Re < temp 


Exceptions: 


None 


Instruction mnemonics: 


UNPKBL Unpack Bytes to Longwords 
UNPKBW Unpack Bytes to Words 
Qualifiers: 
None 
Description: 


For UNPKBL, the lower two component bytes of Rb are zero-extended to longwords. The 
resulting longwords are written to Rc. 


For UNPKBW, the lower four component bytes of Rb are zero-extended to words. The result- 
ing words are written to Rc. 
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4.14 \Revision History 


Revision 7.0, November 10, 1997 


1. 


Be sare aes ae 


Added ECO 94, AMASK and IMPLVER instructions 
Added ECO 88, int ——> f-p; f-p ——> int move 
Added ECO 90, graphics and video instructions 
Added ECO 81, byte/word/sext instructions 

Alpha AXP ——> Alpha 

Added ECO 103 and 104, floating-point changes 
Added ECO 109, exception handling revision 


Changed /S from meaning Software to meaning Exception Completion 


Revision 6.0 , December 1994 


L. 


ee OS Ce a) ae 


NO =| |= | =| FEF PFO rr RP Sl 
oOo Oo WAN HWDH Nn FP WN KF OS 


Added ECO 82, IEEE X_floating 

Added ECO 76, Bi-endian support 

Added ECO 69, RPCC 

Added description for MAP_F and MAP_S in LDF and LDS instructions 
Alpha ——> Alpha AXP 

Added ECO 70, Operand clarification for Mx_FPCR instructions 

Added ECO 65, trap shadow rules 

Made instruction format notation to agree with Programming Quick Ref 


From A_SRM note 158.3, better unaligned sequences 


. Added ECO 63, IEEE signal NaN correction 

. Added ECO 58, STx_C usage restriction and subsequent notes regarding it 

. Added EXCB instruction 

. From CVTLQ instructions, removed reference to /S qualifier 

. Add WMB and CVTST instructions 

. Add floating-point function code field documentation 

. For EXTxx example correct comment (R1 = ssss DCBA) 

. Made CTVTS instruction specific as T——>S 

. From A_SRM note 155.5, changed EXTxL SLL, SRA —-> ADDL 

. Correct bit 63 description of FPCR 

. From 4.7.5.1, at trap summary point to exception summary parameter/register 
Pa 


Edit CVTTS instruction to not preclude CTTST 


Revision 5.0, May 12, 1992 


1. 


Added eco #41 to LDx_C and format style change 
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Ce ae 


Changed DRAINT to TRPB 

Converted to SDML 

Modifed description of MULQ to spec. operands and result are signed 
Removed FCMOV and CVTLQ from instructions that set FPCR bits 
Changed byte mask for INSxx and MSKxx instructions to 16 bit value 


Revision 4.0, March 29, 1991 


ile 
2: 


GN 


ak ee ak 


11. 


12. 
b3. 
14. 


15. 


16. 
17. 


18. 
19. 
20. 
21. 


22: 
Zo: 
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Added Scaled Add and Subtract 
Added FPCR register and accompanying text 


Bits <13:0> of branch displacement field in RET and JSR_COROUTINE reserved to 
DIGITAL software 


Removed references to D_floating point 


Clarified floating-point subset requirements and added OpenVMS requirements for FP 
regs and T_floating memory ops in implementation without floating-point support 


Make TEST a dyadic operator with explicit condition argument 

Fix ADDQ to allow literal as second operand, not first 

Add format type to Arithmetic and Logical and shift Instructions 

Rename operator ARITH_SHIFT to ARITH_RIGHT_SHIFT and upgrade description 


. Add description of how to derive upper 64 bits of product (using UMULH) to MULQ 


description 


Add requirement that F_, D_, and G_floating operate Instructions materialize a true 
zero 


Clarify expressions for MAX F_, D_, G_, S_, and T_ values 
Reorder special values table in floating-point encodings section 


Modify MB description to indicate that MB works only on instructions from issuing 
processor 


Disambiguate between instances when floating disabled faults and illegal instruction 
traps are taken 


Clarify that low order bits are returned on integer overflow arithmetic conversion traps 


Add description to STx_C Instruction that clarifies implementation requirements for 
execution of STxC Instruction 


Correct decimal value given for MIN T_floating 
Impose uniform usage of CASE pseudocode construct 
Insert spaces into long hex and binary values to improve legibility 


Added optimized sign-extended byte load code fragment to code examples in Extract 
Byte Instruction description 


Clarify use and significance of X+C notation in code examples for Extract Byte Instruc- 
tion . 


Clarify note describing how a Read For Ownership cache coherency protocol can affect 
LDx_L/STx_C sequence 
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24. 


20. 
26. 


27. 
28. 
29. 


Change reference in Floating-Point Operate Format Instructions from ‘floating-point 
arithmetic operations’ to ‘floating-point operate Instructions’ 


Rename RCC instruction to ‘Read Process Cycle Counter’ and modify definition 


Changed values of displacement bits <13:0> in ‘Jump To Subroutine’ instruction to 
indicate that all values from 0010 to 1111)¢ are reserved to DIGITAL 


Removed text in Longword Add instruction that described carry detection 
Specified overflow bits returned for Longword Mulutiply 


Removed text in Longword Subtract instruction that described carry detection 


Revision 3.0, March 2, 1990 


1. 


2 
3 
4 
oe 
6 
7 
8 
9 


10. 
11. 
12. 
13. 
14. 
15. 
16. 


Rename GOTO to BR, and JSRs to JMP, JSB, RET 

Rename MSKxx to ZAPxx 

Remove CVTFQ, and CMPFxx 

Remove CVT float-to-longword; add CVTQL/LQ 

Make non-canonical longword +-* well-defined 

Rename memory-format JSR to BSR 

Add VAX compatibility Instructions RC, RS 

Add Fetch and Fetch_M 

Add low bit set and clear cmoves 

Remove Nudge 

Add longword lock Instructions 

Remove longword load address Instructions 

Add quadword load address high 

Rework the LDx/L description 

Change EXTxx/INSxx back to V1.0 SRM EXTxx/INSxx/MRGxx 
Change floating-point exception behavior back to V1.0 SRM behavior 


Revision 2.0, October 4, 1989 


is 


Se St Os ee 


Add TLE provided comment on emulation of Instructions 
Change shift range from 0..64 to 0..63 

Remove FASx, SWP, FREEZE, THAW Instructions 

Add load lock and store conditional Instructions 

Remove WAIT/WAITFE Instructions 

Change DRAIN to DRAINT and only drain for arithmetic traps 
Add memory barrier and nudge Instructions 

Rework Floating-point exceptions 


Add cycle counter 
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Revision 1.0, May 23, 1989 

Rework Floating-point to be unmoded 

Remove subsetting of integer MUL 

Remove integer DIV 

Add Freeze and Thaw 

Rename Lock/Unlock to SWP and FASx and remove long version of lock 
Add conditional move 

Add branch on low bit branches (BLBS/BLBC) 

Add WAIT/WAITFE Instructions 


a ave ee ee, a a 


Revision 0.0, March 15, 1989 


1. Initial Version\ 
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Chapter 5 


System Architecture and Programming 


Implications (1) 


5.1 Introduction 


Portions of the Alpha architecture have implications for programming, and the system struc- 
ture, of both uniprocessor and multiprocessor implementations. Architectural implications 
considered in the following sections are: 


e Physical address space behavior 

¢ Caches and write buffers 

e = Translation buffers and virtual caches 
e Data sharing 

¢ Read/write ordering 

e § Arithmetic traps 


To meet the requirements of the Alpha architecture, software and hardware implementors need 
to take these issues into consideration. 


5.2 Physical Address Space Characteristics 


Alpha physical address space is divided into four equal-size regions. The regions are delin- 
eated by the two most significant, implemented, physical address bits. Each region’s 
characteristics are distinguished by the coherency, granularity, and width of memory accesses, 
and whether the region exhibits memory-like behavior or non-memory-like behavior. 


5.2.1 Coherency of Memory Access 


' Alpha implementations must provide a coherent view of memory, in which each write by a 
processor or I/O device (hereafter, called "processor") becomes visible to all other processors. 
No distinction is made between coherency of "memory space" and "I/O space." 
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Memory coherency may be provided in different ways for each of the four physical address 
regions. 


Possible per-region policies include, but are not restricted to: 
e No caching 


No copies are kept of data in a region; all reads and writes access the actual data 
location (memory or I/O register), but a processor may elide multiple accesses to the 
same data (see Section 5.2.3). 


e = =6°~Write-through caching 


Copies are kept of any data in the region; reads may use the copies, but writes update 
the actual data location and either update or invalidate all copies. 


e =©6Write-back caching 


Copies are kept of any data in the region; reads and writes may use the copies, and 
writes use additional state to determine whether there are other copies to invalidate or 
update. 


Software/Hardware Note: 


To produce separate and distinct accesses to a specific location, the location must be a 
region with no caching and a memory barrier instruction must be inserted between 
accesses. See Section 5.2.3. . 


Part of the coherency policy implemented for a given physical address region may include 
restrictions on excess data transfers (performing more accesses to a location than is necessary 
to acquire or change the location’s value) or may specify data transfer widths (the granularity 
used to access a location). 


Independent of coherency policy, a processor may use different hardware or different hard- 
ware resource policies for caching or buffering different physical address regions. 


5.2.2 Granularity of Memory Access 


For each region, an implementation must support aligned quadword access and may optionally 
support aligned longword access or byte access. If byte access is supported in a region, aligned 
word access and aligned longword access are also supported. 


For a quadword access region, accesses to physical memory must be implemented such that 
independent accesses to adjacent aligned quadwords produce the same results regardless of the 
order of execution. Further, an access to an aligned quadword must be done in a single atomic 
operation. 


For a longword access region, accesses to physical memory must be implemented such that 
independent accesses to adjacent aligned longwords produce the same results regardless of the 
order of execution. Further, an access to an aligned longword must be done in a single atomic 
operation, and an access to an aligned quadword must also be done in a single atomic 
operation. 
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For a byte access region, accesses to physical memory must be implemented such that indepen- 
dent accesses to adjacent bytes or adjacent aligned words produce the same results, regardless 
of the order of execution. Further, an access to a byte, an aligned word, an aligned longword, 
or an aligned quadword must be done in a single atomic operation. 


In this context, "atomic" means that the following is true if different processors do simulta- 
neous reads and writes of the same data: 


e The result of any set of writes must be the same as if the writes had occurred sequen- 
tially in some order, and 


e Any read that observes the effect of a write on some part of memory must observe the 
effect of that write (or of a later write or writes) on the entire part of memory that is 
accessed by both the read and the write. 


When a write accesses only part of a given word, longword, or quadword, a read of the entire 
structure may observe the effect of that partial write without observing the effect of an earlier 
write of another byte or bytes to the same structure. See Sections 5.6.1.5 and 5.6.1.6. 


5.2.3 Width of Memory Access 


Subject to the granularity, ordering, and coherency constraints given in Sections 5.2.1, 5.2.2, 
and 5.6, accesses to physical memory may be freely cached, buffered, and prefetched. 


A processor may read more physical memory data (such as a full cache block) than is actually 
accessed, writes may trigger reads, and writes may write back more data than is actually 
updated. A processor may elide multiple reads and/or writes to the same data. 


5.2.4 Memory-Like and Non-Memory-Like Behavior 


Memory-like regions obey the following rules: 


¢ Each page frame in the region either exists in its entirety or does not exist in its entirety; 
there are no holes within a page frame. 


e §6All locations that exist are read/write. 


e A write to a location followed by a read from that location returns precisely the bits 
written; all bits act as memory. 


e A write to one location does not change any other location. 
e Reads have no side effects. 


¢ Longword access granularity is provided, and if the byte/word extension is imple- 
mented, byte access granularity is provided. 


e Instruction-fetch is supported. 

e Load-locked and store-conditional are supported. 
Non-memory-like regions may have much more arbitrary behavior: 

e Unimplemented locations or bits may exist anywhere. 


¢ Some locations or bits may be read-only and others write-only. 
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e Address ranges may overlap, such that a write to one location changes the bits read 
from a different location. 


e Reads may have side effects, although this is strongly discouraged. 


¢ Longword granularity need not be supported and, even if the byte/word extension is 
implemented, byte access granularity need not be implemented. 


¢ Instruction-fetch need not be supported. 


¢ Load-locked and store-conditional need not be supported. 


Hardware/Software Coordination Note: 


The details of such behavior are outside the scope of the Alpha architecture. Specific 
processor and I/O device implementations may choose and document whatever behavior 
they need. It is the responsibility of system designers to impose enough consistency to 
allow processors successfully to access matching non-memory devices in a coherent way. 


5.3 Translation Buffers and Virtual Caches 


A system may choose to include a virtual instruction cache (virtual I-cache) or a virtual data 
cache (virtual D-cache). A system may also choose to include either a combined data and 
instruction translation buffer (TB) or separate data and instruction TBs (DTB and ITB). The 
contents of these caches and/or translation buffers may become invalid, depending on what 
operating system activity is being performed. 


Whenever a non-software field of a valid page table entry (PTE) is modified, copies of that 
PTE must be made coherent. PALcode mechanisms are available to clear all TBs, both DTB 
and ITB entries for a given VA, either DTB or ITB entries for a given VA, or all entries with 
the address space match (ASM) bit clear. Virtual D-cache entries are made coherent whenever 
the corresponding DTB entry is requested to be cleared by any of the appropriate PALcode 
mechanisms. Virtual I-cache entries can be made coherent via the IMB instruction. 


If a processor implements address space numbers (ASNs), and the old PTE has the Address 
Space Match (ASM) bit clear (ASNs in use) and the Valid bit set, then entries can also effec- 
tively be made coherent by assigning a new, unused ASN to the currently running process and 
not reusing the previous ASN before calling the appropriate PALcode routine to invalidate the 
translation buffer (TB). 


In a multiprocessor environment, making the TBs and/or caches coherent on only one proces- 
sor is not always sufficient. An operating system must arrange to perform the above actions on 
each processor that could possibly have copies of the PTE or data for any affected page. 


5.4 Caches and Write Buffers 


man maw anal da manhaniama 
A hardware implementation may include mechanisms 


ing local copies of recently used memory contents (or those expected to be used) or by 
buffering writes to complete at a later time. Caches and write buffers are examples of these 
mechanisms. They must be implemented so that their existence is transparent to software 
(except for timing, error reporting/control/recovery, and modification to the I-stream). 
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The following requirements must be met by all cache/write-buffer implementations. All pro- 
cessors must provide a coherent view of memory. 


Write buffers may be used to delay and aggregate writes. From the viewpoint of another 
processor, buffered writes appear not to have happened yet. (Write buffers must not 
delay writes indefinitely. See Section 5.6.1.9.) 


Write-back caches must be able to detect a later write from another processor and inval- 
idate or update the cache contents. . 


A processor must guarantee that a data store to a location followed by a data load from 
the same location reads the updated value. 


Cache prefetching is allowed, but virtual caches must not prefetch from invalid pages. 
See Sections 5.6.1.3, 5.6.4.3, and 5.6.4.4. 


A processor must guarantee that all of its previous writes are visible to all other proces- 
sors before a HALT instruction completes. A processor must guarantee that its caches 
are coherent with the rest of the system before continuing from a HALT. 


If battery backup is supplied, a processor must guarantee that the memory system 
remains coherent across a powerfail/recovery sequence. Data that was written by the 
processor before the powerfail may not be lost, and any caches must be in a valid state 
before (and if) normal instruction processing is continued after power is restored. 


Virtual instruction caches are not required to notice modifications of the virtual 
I-stream (they need not be coherent with the rest of memory). Software that creates or 
modifies the instruction stream must execute a CALL_PAL IMB before trying to exe- 
cute the new instructions. | 


In this context, to "modify the virtual I-stream" means either: 


— any Store to the same physical address that is subsequently fetched as an instruction 
by some corresponding (virtual address, ASN) pair, or 


— any change to the virtual-to-physical address mapping so that different values are 
fetched. 


For example, if two different virtual addresses, VA1 and VA2, map to the same page 
frame, a store to VA1 modifies the virtual I-stream fetched by VA2. 


However, the following sequence does not modify the virtual I-stream (this might 
happen in soft page faults). 


1. Change the mapping of an I-stream page from valid to invalid. 
2. Copy the corresponding page frame to a new page frame. 
3. Change the original mapping to be valid and point to the new page frame. 


Physical instruction caches are not required to notice modifications of the physical 
I-stream (they need not be coherent with the rest of memory), except for certain paging 
activity. (See Section 5.6.4.4.) Software that creates or modifies the instruction stream 
must execute a CALL_PAL IMB before trying to execute the new instructions. 


In this context, to "modify the physical I-stream" means any Store to the same 
physical address that is subsequently fetched as an instruction. 
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5.5 Data Sharing 


In a multiprocessor environment, writes to shared data must be synchronized by the program- 
mer. | 


5.5.1 Atomic Change of a Single Datum 


The ordinary STL and STQ instructions can be used to perform an atomic change of a shared 
aligned longword or quadword. ("Change" means that the new value is not a function of the 
old value.) In particular, an ordinary STL or STQ instruction can be used to change a variable 
that could be simultaneously accessed via an LDx_L/STx_C sequence. 


5.5.2 Atomic Update of a Single Datum 


The load-locked/store-conditional instructions may be used to perform an atomic update of a 
shared aligned longword or quadword. ("Update" means that the new value is a function of the 
old value.) 


The following sequence performs a read-modify-write operation on location x. Only regis- 
. ter-to-register operate instructions and branch fall-throughs may occur in the sequence: 


try_again: 
LDQ L R1,x 
<modify R1> 
STQ C RI1,x 
BEQ R1l,no_store 


no_store: 
<code to check for excessive iterations> 
BR try_again 


If this sequence runs with no exceptions or interrupts, and no other processor writes to loca- 
tion x (more precisely, the locked range including x) between the LDQ_L and STQ_C 
instructions, then the STQ_C shown in the example stores the modified value in x and sets R1 
to 1. If, however, the sequence encounters exceptions or interrupts that eventually continue the 
sequence, or another processor writes to x, then the STQ_C does not store and sets R1 to 0. In 
this case, the sequence is repeated by the branches to no_store and try_again. This repetition 
continues until the reasons for exceptions or interrupts are removed and no interfering store is 
encountered. 


To be useful, the sequence must be constructed so that it can be replayed an arbitrary number 
of times, giving the same result values each time. A sufficient (but not necessary) condition is 
that, within the sequence, the set of operand destinations and the set of operand sources are 
disjoint. 


a Iwewe 


A sufficiently long instruction sequence between LDx_L and STx_C will never complete, 
because periodic timer interrupts will always occur before the sequence completes. The 
rules in Appendix A describe sequences that will eventually complete in all Alpha 
implementations. 
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This load-locked/store-conditional paradigm may be used whenever an atomic update of a 
shared aligned quadword is desired, including getting the effect of atomic byte writes. 


5.5.3 Atomic Update of Data Structures 


Before accessing shared writable data structures (those that are not a single aligned longword 
or quadword), the programmer can acquire control of the data structure by using an atomic 
update to set a software lock variable. Such a software lock can be cleared with an ordinary 
store instruction. 


A software-critical section, therefore, may look like the sequence: 


stq_c_loop: 
spin_loop: 


‘LDQ R1,lock_variable ; This optional spin-loop code 
BLBS Rl,already set ; should be used unless the 
> lock is known to be low-contention. 
LDQ L R1,lock_ variable XX 
BLBS R1l,already set 
OR R1,#1,R2 ’ > Set lock bit 
STQ C R2,lock_ variable we 
BEQ R2,stq_c fail ae A 
MB 
<critical section: updates various data structures> 
MB s Second MB 
STQ R31,lock_variable + Clear lock bit 


already set: 
<code to block or reschedule or test for too many iterations> 
BR spin_loop 
stq_c fail: 
<code to test for too many iterations> 
BR stq_c_ loop 


This code has a number of subtleties: 


e If the lock_variable is already set, the spin loop is done without doing any stores. This 
avoidance of stores improves memory subsystem performance and avoids the deadlock 
described below. The loop uses an ordinary load. This code sequence is preferred unless 
the lock is known to be low-contention, because the sequence increases the probability 
that the LDQ_L hits in the cache and the LDQ_L/STQ_C sequence complete quickly 
and successfully. 


e If the lock_variable is actually being changed from 0 to 1, and the STQ_C fails (due to 
an interrupt, or because another processor simultaneously changed lock_variable), the 
entire process starts over by reading the lock_variable again. 


¢ Only the fall-through path of the BLBS instructions does a STx_C; some implementa- 
tions may not allow a successful STx_C after a branch-taken. 


¢ Only register-to-register operate instructions are used to do the modify. 
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¢ Both conditional branches are forward branches, so they are properly predicted not to 
be taken (to match the common case of no contention for the lock). 


¢ The OR writes its result to a second register; this allows the OR and the BLBS to be 
interchanged if that would give a faster instruction schedule. 


¢ Other operate instructions (from the critical section) may be scheduled into the 
LDQ_L..STQ_C sequence, so long as they do not fault or trap and they give correct 
results if repeated; other memory or operate instructions may be scheduled between the 
STQ_C and BEQ. 


¢ The memory barrier instructions are discussed in Section 5.5.4. It is correct to substitute 
WMB for the second MB only if: 


— All data locations that are read or written in the critical section are accessed only 
after acquiring a software lock by using lock_variable (and before releasing the 
software lock). 


— For each read u of shared data in the critical section, there is a write v such that: 


1. vis BEFORE the WMB 
2. v follows u in processor issue sequence (see Section 5.6.1.1) 


3. v either depends on u (see Section 5.6.1.7) or overlaps u (see Section 5.6.1), or 
both. 


— Both lock_variable and all the shared data are in memory-like regions (or 
lock_variable and all the shared data are in non-memory-like regions). If the 
lock_variable is in a non-memory-like region, the atomic lock protocol must use 
some implementation-specific hardware support. 


Generally, the substitution of a WMB for the second MB increases performance. 
e §6An ordinary STQ instruction is used to clear the lock_variable. 


It would be a performance mistake to spin-wait by repeating the full LDQ_L..STQ_C 
sequence (to move the BLBS after the BEQ) because that sequence may repeatedly change the 
software lock_variable from "locked" to "locked," with each write causing extra access delays 
in all other caches that contain the lock_variable. In the extreme, spin-waits that contain writes 
may deadlock as follows: 


If, when one processor spins with writes, another processor is modifying (not changing) 
the lock_variable, then the writes on the first processor may cause the STx_C of the 
modify on the second processor always to fail. 

This deadlock situation is avoided by: 
e Having only one processor execute a store (no STx_C), or 
¢ Having no write in the spin loop, or 


¢ Doing a write only if the shared variable actually changes state (1 — 1 does not change 
state). 
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5.5.4 Ordering Considerations for Shared Data Structures 


A critical section sequence, such as shown in Section 5.5.3, is conceptually only three steps: 


1. 


Acquire software lock 


2. Critical section — read/write shared data 


3. Clear software lock 


In the absence of explicit instructions to the contrary, the Alpha architecture allows reads and 
writes to be reordered. While this may allow more implementation speed and overlap, it can 
also create undesired side effects on shared data structures. Normally, the critical section just 
described would have two instructions added to it: 


<acquire software lock> 

MB (memory barrier #1) 

<critical section — read/write shared data> 
MB (memory barrier #2) 

<clear software lock> 

<endcode_example> 


The first memory barrier prevents any reads (from within the critical section) from being 
prefetched before the software lock is acquired; such prefetched reads would potentially con- 
tain stale data. 


The second memory barrier prevents any writes and reads in the critical section being delayed 
past the clearing of the software lock. Such delayed accesses could interact with the next user 
of the shared data, defeating the purpose of the software lock entirely. It is correct to substitute 
WMB for the second MB only if: 


1. 


All data locations that are read or written in the critical section are accessed only after 
acquiring a software lock by using lock_variable (and before releasing the software 
lock). 

For each read u of shared data in the critical section, there is a write v such that: 

a. vis BEFORE the WMB 

b. v follows u in processor issue sequence (see Section 5.6.1.1) 

c. v either depends on u (see Section 5.6.1.7) or overlaps u (see Section 5.6.1), or both. 


Both lock_variable and all the shared data are in memory-like regions (or lock_variable 
and all the shared data are in non-memory-like regions). If the lock_variable is in a 
non-memory-like region, the atomic lock protocol must use some implementation-spe- 
cific hardware support. 


Generally, the substitution of a WMB for the second MB increases performance. 


Software Note: 


In the VAX architecture, many instructions provide noninterruptable read-modify-write 
sequences to memory variables. Most programmers never regard data sharing as an issue. 


In the Alpha architecture, programmers must pay more attention to synchronizing access 
to shared data; for example, to AST routines. In the VAX architecture, a programmer can 
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use an ADDL2 to update a variable that is shared between a "MAIN" routine and an AST 
routine, if running on a single processor. In the Alpha architecture, a programmer must | 
deal with AST shared data by using multiprocessor shared data sequences. 


5.6 Read/Write Ordering 


This section applies to programs that run on multiple processors or on one or more processors 
that are interacting with DMA I/O devices. To a program running on a single processor and 
not interacting with DMA I/O devices, all memory accesses appear to happen in the order 
specified by the programmer. This section deals with predictable read/write ordering across 
multiple processors and/or DMA I/O devices. 


The order of reads and writes done in an Alpha implementation may differ from that specified 
by the programmer. 


For any two memory accesses A and B, either A must occur before B in all Alpha implementa- 
tions, B must occur before A, or they are UNORDERED. In the last case, software cannot 
depend upon one occurring first: the order may vary from implementation to implementation, 
and even from run to run or moment to moment on a single implementation. 


If two accesses cannot be shown to be ordered by the rules given, they are UNORDERED and 
implementations are free to do them in any order that is convenient. Implementations may take 
advantage of this freedom to deliver substantially higher performance. 


The discussion that follows first defines the architectural issue sequence of memory accesses 
on a single processor, then defines the (partial) ordering on this issue sequence that all Alpha 
implementations are required to maintain. 


The individual issue sequences on multiple processors are merged into access sequences at 
each shared memory location. The discussion defines the (partial) ordering on the individual 
access sequences that all Alpha implementations are required to maintain. 


The net result is that for any code that executes on multiple processors, one can determine 
which memory accesses are required to occur before others on all Alpha implementations and 
hence can write useful shared-variable software. 


Software writers can force one access to occur before another by inserting a memory barrier 
instruction (MB, WMB, or CALL_PAL IMB) between the accesses. 


5.6.1 Alpha Shared Memory Model 


An Alpha system consists of a collection of processors, I/O devices (and possibly a bridge to 
connect remote I/O devices), and shared memories that are accessible by all processors. 


Note: 
An example of an unshared location is a physical address in I/O space that refers to a CSR 
that is local to a processor and not accessible by other processors. 


A processor is an Alpha CPU. 
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In most systems, DMA I/O devices or other agents can read or write shared memory locations. 
The order of accesses by those agents is not completely specified in this document. It is possi- 
ble in some systems for read accesses by I/O devices or other agents to give results indicating 
some reordering of accesses. However, there are guarantees that apply in all systems. See Sec- 
tion 5.6.4.7. 


\Note: 


A companion I/O bus or I/O device family architectural specification should exist and 
should document the read/write order properties of the I/O family. To simplify porting of 
device drivers, multiple generations and classes of systems based on a particular I/O bus 
or family of I/O devices should conform to a single I/O architecture specification.\ 


A shared memory is the primary storage place for one or more locations. 


A location is a byte, specified by its physical address. Multiple virtual addresses may map to 
the same physical address. Ordering considerations are based only on the physical address. 
This definition of location specifically includes locations and registers in memory mapped I/O 
devices and bridges to remote I/O (for example, Mailbox Pointer Registers, or MBPRs). 


Implementation Note: 


An implementation may allow a location to have multiple physical addresses, but the rules 
for accesses via mixtures of the addresses are implementation-specific and outside the 
scope of this section. Accesses via exactly one of the physical addresses follow the rules 
described next. 


Each processor may generate accesses to shared memory locations. There are six types of 
accesses: 


1. Instruction fetch by processor i to location x, returning value a, denoted Pi:I<4>(x,a). 


2. Data read (including load-locked) by processor i to location x, returning value a, 
denoted Pi:R<size>(x,a). 


3. Data write (including successful store-conditional) by processor i to location x, storing 
value a, denoted Pi:W<size>(x,a). 


4. Memory barrier issued by processor i, denoted Pi:MB. 
5. Write memory barrier issued by processor i, denoted Pi: WMB. 
6. I-stream memory barrier issued by processor i, denoted Pi:IMB. 


The first access type is also called an I-stream access or I-fetch. The next two are also called 
D-stream accesses. The first three types are collectively called read/write accesses, denoted 
Pi:Op<m>(x,a), where m is the size of the access in bytes, x is the (physical) address of the 
access, and a is a value representable in m bytes; for any k in the range 0..m—1, byte k of value 
a (where byte 0 is the low-order byte) is the value written to or read from location x+k by the 
access. This relationship reflects little-endian addressing; big-endian addressing representation 
is as described in Chapter 2. 


The last three types collectively are called barriers or memory barriers. 


The size of a read/write access is 8 for a quadword access, 4 for a longword access (including 
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all instruction fetches), 2 for a word access, or 1 for a byte access. All read/write accesses in 
this chapter are naturally aligned. That is, they have the form Pi:Op<m>(x,a), where the 
address x is divisible by size m. 


The word "access" is also used as a verb; a read/write access Pi:Op<m>(x,a) accesses byte z if 
X <z<x+m. Two read/write accesses Opl1<m>(x,a) and Op2<n>(y,b) are defined to overlap if 
there is at least one byte that is accessed by both, that is, if max(x,y) < min(x+m,y+n). 


5.6.1.1 Architectural Definition of Processor Issue Sequence 


The issue sequence for a processor is architecturally defined with respect to a hypothetical sim- 
ple implementation that contains one processor and a single shared memory, with no caches or 
buffers. This is the instruction execution model: 


1. I-fetch: An Alpha instruction is fetched from memory. 


2. Read/Write: That instruction is executed and runs to completion, including a single data 
read from memory for a Load instruction or a single data write to memory for a Store 
instruction. 


3. Update: The PC for the processor is updated. 
4. Loop: Repeat the above sequence indefinitely. 


If the instruction fetch step gets a memory management fault, the I-fetch is not done and the 
PC is updated to point to a PALcode fault handler. If the read/write step gets a memory man- 
agement fault, the read/write is not done and the PC is updated to point to a PALcode fault 
handler. 


5.6.1.2 Definition of Before and After 


The ordering relation BEFORE (€ ) is a partial order on memory accesses. It is further 
defined in Sections 5.6.1.3 through 5.6.1.9. 


The ordering relation BEFORE (< ), being a partial order, is acyclic. 


The BEFORE order cannot be observed directly, nor fully predicted before an actual execu- 
tion, nor reproduced exactly from one execution to another. Nonetheless, some useful ordering 
properties must hold in all Alpha implementations. 


If u — v, then vis said to be AFTER uw. 


5.6.1.3 Definition of Processor Issue Constraints 


Processor issue constraints are imposed on the processor issue sequence defined in Section 
5.6.1.1, as shown in Table 5-1: 
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Table 5-1: Processor Issue Constraints 


“Astl2nd—> ——-Pi:I<n=4>(y,b) Pi:R<n>(y,b) Pi: Wen>(y,b)  Pi:MB = PisIMB 
Pi:R<m>(x,a) <= if overlap < if overlap = ee 
Pi:W<m>(x,a) <= if overlap = von 
Pi:MB = e a & 

Pi:IMB — = — = ec 


Where "overlap" denotes the condition max(x,y) < min(x+m,y+n). 


For two accesses u and v issued by processor Pi, if u precedes v by processor issue constraint, 
then u precedes v in BEFORE order. u and v on Pi are ordered by processor issue constraint if 
any of the following applies: 


1. The entry in Table 5—1 indicated by the access type of u (1st) and v (2nd) indicates the 
accesses are ordered. 


2. uand v are both writes to memory-like regions and there is a WMB between u and v in 
processor issue sequence. 


3. uand v are both writes to non-memory-like regions and there is a WMB between u and 
v in processor issue sequence. 


4. uisa TB fill that updates a PTE, for example, a PTE read in order to satisfy a TB miss, 
and v is an I- or D-stream access using that PTE (see Sections 5.6.4.3 and 5.6.4.4). 


In Table 5-1, /st and 2nd refer to the ordering of accesses in the processor issue sequence. 
Note that Table 5—1 imposes no direct constraint on the ordering relationship between non- 
overlapping read/write accesses, though there may be indirect constraints due to the 
transitivity of BEFORE (< ). Conditions 2 through 4, above, impose ordering constraints on 
some pairs of nonoverlapping read/write accesses. 


Table 5—1 permits a read access Pi:R<n>(y,b) to be ordered BEFORE an overlapping write 
access Pi:W<m>(x,a) that precedes the read access in processor issue order. This asymmetry 
for reads allows reads to be satisfied by using data from an earlier write in processor issue 
sequence by the same processor (for example, by hitting in a write buffer) before the write 
completes. The write access remains "visible" to the read access; "visibility" is described in 
Sections 5.6.1.5 and 5.6.1.6 and illustrated in Litmus Test 11 in Section 5.6.2.11. 


An I-fetch Pi:I<4>(y,b) may also be ordered BEFORE an overlapping write Pi:W<m>(x,a) 
that precedes it in processor issue sequence. In that case, the write may, but need not, be visi- 
ble to the I-fetch. This asymmetry in Table 5—1 allows writes to the I-stream to be incoherent 
until a CALL_PAL IMB is executed. 


Implementations are free to perform memory accesses from a single processor in any sequence 
that is consistent with processor issue constraints. 
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5.6.1.4 Definition of Location Access Constraints 


Location access constraints are imposed on overlapping read/write accesses. If u and v are 
overlapping read/write accesses, at least one of which is a write, then u and v must be compara- 
ble in the BEFORE (< ) ordering, that is, either u < v or v — u. 


There is no direct requirement that nonoverlapping accesses be comparable in the BEFORE 
(<= ) ordering. 


All writes accessing any given byte are totally ordered, and any read or I-fetch accessing a 
given byte is ordered with respect to all writes accessing that byte. 


5.6.1.5 Definition of Visibility 


If u is a write access Pi:W<m>(x,a) and v is an overlapping read access Pj:R<n>(y,b), u is visi- 
ble to v only if: 


u <= Vv, OF 
u precedes v in processor issue sequence (possible only if Pi=Pj). 


If u is a write access Pi:W<m>(x,a) and v is an overlapping instruction fetch Pj:I<4>(y,b), 
there are the following rules for visibility: 


1. Ifu <<», then u is visible to v. 
2. Ifu precedes v in processor issue sequence, then: 
a. If there is a write w such that: 


u overlaps w and precedes w in processor issue sequence, and 
w is visible to v, 


then u is visible to v. 
b. If there is an instruction fetch w such that: 
u is visible to w, and | 
w overlaps v and precedes v in processor issue sequence, 
then u is visible to v. 


3. If u does not precede v in either processor issue sequence or BEFORE order, then u is 
not visible to v. 


Note that the rules of visibility for reads and instruction fetches are slightly different. If a write 
u precedes an overlapping instruction fetch v in processor issue sequence, but u is not 
BEFORE v, then u may or may not be visible to v. 


5.6.1.6 Definition of Storage 
The property of storage applies only to memory-like regions. 


The value read from any byte by a read access or instruction fetch v, is the value written by the 
latest Gn BEFORE order) write u to that byte that is visible to v. More formally: 


If u is Pi:W<m>(x,a), and v is either Pj:Il<4>(y,b) or Pj:R<n>(y,b), and z is a byte 
accessed by both uw and v, and u is visible to v; and there is no write that is AFTER uy, is 
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visible to v, and accesses byte z; then the value of byte z read by v is exactly the value 
written by u. In this situation, u is a source of v. 


The only way to communicate information between different processors is for one to write a 
shared location and the other to read the shared location and receive the newly written value. 
(In this context, the sending of an interrupt from processor Pi to Pj is modeled as Pi writing to 
a location INTij, and Pj reading from INT.) 


5.6.1.7 Definition of Dependence Constraint 


The depends relation (DP) is defined as follows. Given u and v issued by processor Pi, where 
u is a read or an instruction fetch and v is a write, u precedes v in DP order (written u DP v, 
that is, v depends on w) in either of the following situations: 


e udetermines the execution of v, the location accessed by v, or the value written by v. 


e u determines the execution or address or value of another memory access z that pre- 
cedes v or might precede v (that is, would precede v in some execution path depending 
on the value read by u) by processor issue constraint (see Section 5.6.1.3). 


Note that the DP relation does not directly impose a BEFORE (<=) ordering between accesses 
u and vy. 


The dependence constraint requires that the union of the DP relation and the "is a source of" 
relation (see Section 5.6.1.6) be acyclic. That is, there must not exist reads and/or I-fetches 
R1, ..., Rn, and writes W1, ..., Wn, such that: 


1 n2l, 

2. For eachi, 1 <i<n, Ri DP Wi, 

3. For eachi, 1 <i<n, Wiis a source of Ri+ 1, and 
4. Wnisa source of R1. 


That constraint eliminates the possibility of "causal loops." A simple example of a "causal 
loop" is when the execution of a write on Pi depends on the execution of a write on Pj and vice 
versa, creating a circular dependence chain. The following simple example of a "causal loop" 
is written in the style of the litmus tests in Section 5.6.2, where initially x and y are 1: 


Processor Pi executes: 


LDQ =R1,x 
STQ R1,y 


Processor Pj executes: 


LDQ R1,y 
STO R1,x 
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Representing those code sequences in the style of the litmus tests in Section 5.6.2, it is impos- 
sible for the following sequence to result: 


Pi | Pj 

[U1] Pi:R<8>(x,0) [V1] Pj:R<8>(y,0) | 

[U2] Pi:W<8>(y,0) [V2] Pj:W<8>(x,0) 
Analysis: 


<l> By the definitions of storage and visibility, U2 is the source of V1, and V2 is the 
source of U1. 


<2> By the definition of DP and examination of the code, U1 DP U2, and V1 DP V2. 


<3> Thus, Ul DP U2, U2 is the source of V1, V1 DP V2, and V2 is the source of U1. 
This circular chain is forbidden by the dependence constraint. 


Given the initial condition x, y = 1, the access sequence above would also be impossible if the 
code were: 


Processor Pi’s program: 


LDQ R1,x 
BNE R1,done 
STQ R31,y 


done: 


Processor Pj’s program: 


LDO R1,y 
BNE R1,done 
STQ R31,x 


done: 


5.6.1.8 Definition of Load-Locked and Store-Conditional 


- The property of load-locked and store-conditional applies only to memory-like regions. 


For each successful store-conditional v, there exists a load-locked u such that the following are 
true: 


1. u precedes v in the processor issue sequence. 


2. There is no load-locked or store-conditional between u and v in the processor issue 
sequence. 


3. Ifuand v access within the same naturally aligned 16-byte physical and virtual block in 
memory, then for every write w by a different processor that accesses within u’s lock 
range (where w is either a store or a successful store conditional), it must be true that w 
=Huorvew. 
u’s lock range contains the region of physical memory that u accesses. See Sections 4.2.4 and 
4.2.5, which define the lock range and conditions for success or failure of a store conditional. 
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5.6.1.9 Timeliness 


Even in the absence of a barrier after the write, no write by a processor may be delayed indefi- 
nitely in the BEFORE ordering. 


5.6.2 Litmus Tests 


Many issues about writing and reading shared data can be cast into questions about whether a 
write is before or after a read. These questions can be answered by rigorously checking 
whether any ordering satisfies the rules in Sections 5.6.1.3 through 5.6.1.8. 


In litmus tests 1—9 below, all initial quadword memory locations contain 1. In all these litmus 
tests, it is assumed that initializations are performed by a write or writes that are BEFORE all 
the explicitly listed accesses, that all relevant writes other than the initializations are explicitly 
shown, and that all accesses shown are to memory-like regions (so the definition of storage 


applies). 
5.6.2.1 Litmus Test 1 (Impossible Sequence) 
Initially, iscation x contains 1: 
Pi Pj 
[U1]Pi:W<8>(x,2) [V 1]Pj:R<8>(x,2) 
[V2]Pj:R<8>(x, 1) 
Analysis: 


<1l> By the definition of storage (Section 5.6.1.6), V1 reading 2 implies that U1 is visible 
to V1. 


<2> By the rules for visibility (Section 5.6.1.5), U1 being visible to V1, but being issued 
by a different processor, implies that U1 = V1. 


<3> By the processor issue constraints (Section 5.6.1.3), V1 <= V2. 


<4>_ By the transitivity of the partial order —, it follows from <2> and <3> that Ul <= 
V2. 


<5> By the rules for visibility, it follows from U1 < V2 that U1 is visible to V2. 


<6> Since U1 is AFTER the initialization of x, U1 is the latest (in the <= ordering) write 
to x that is visible to V1. 


<7> By the definition of storage, it follows that V2 should read the value written by U1, 
in contradiction to the stated result. 


Thus, once a processor reads a new value from a location, it must never see an old value — 
time must not go backward. V2 must read 2. 
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5.6.2.2 Litmus Test 2 (Impossible Sequence) 


Initially, location x contains 1: 


Pi Pj 
[U1 ]Pi:W<8>(x,2) [V1]Pj:W<8>(x,3) 
[V2]Pj:R<8>(x,2) 
[V3]Pj:R<8>(x,3) 
Analysis: 


<l> Since V1 precedes V2 in processor issue sequence, V1 is visible to V2. 
<2> V2 reading 2 implies U1 is the latest (in — order) write to x visible to V2. 
<3> From <1l>and<2>, V1 —UI1. 
<4> Since U1 is visible to V2, and they are issued by different processors, Ul = V2. 
<5> By the processor issue constraints, V2 = V3. 
<6> From <4> and <5>, Ul <— V3. 
._ <7> From <6> and the visibility rules, U1 is visible to V3. 


<8> Since both V1 and the initialization of x are BEFORE U1, U1 is the latest write to x 
that is visible to V3. 


<9> By the definition of storage, it follows that V3 should read the value written by U1, 
in contradiction to the stated result. 


Thus, once processor Pj reads a new value written by U1, any other writes that must precede 
the read must also precede U1. V3 must read 2. 


5.6.2.3 Litmus Test 3 (Impossible Sequence) 


Initially, location x contains 1: 


Pi Pj Pk 

[U1]Pi: W<8>(x,2) [V1]Pj:W<8>(x,3) [W1]Pk:R<8>(x,3) 

[U2]Pi:R<8>(x,3) [W2]Pk:R<8>(x,2) 
Analysis: 


<1> U2 reading 3 implies V1 is the latest write to x visible to U2, therefore U1 < V1. 


<2> WI reading 3 implies V1 is visible to W1, so V1 = W1 < W, therefore V1 is also 
visible to W2. 


<3>  W2 reading 2 implies U1 is the latest write to x visible to W2, therefore V1 <= U1. 


<4> From <l>and<3>, U1 =—V1<UI. 


Again, time cannot go backwards. If V1 is ordered before U1, then processor Pk cannot read 
first the later value 3 and then the earlier value 2. Alternatively, if V1 is ordered before U1, U2 
must read 2. 
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¥ 


5.6.2.4 Litmus Test 4 (Sequence Okay) 


Initially, locations x and y contain 1: 


Pi . Pj 

[U1]Pi:W<8>(x,2) [V 1]Pj:R<8>(y,2) 

[U2]Pi: W<8>(y,2) [V2]Pj:R<8>(x,1) 
Analysis: 


<l> V1 reading 2 implies U2 = V1, by storage and visibility. 
<2> Since V2 does not read 2, there cannot be Ul < V2. 


<3>__ By the access order constraints, it follows from <2> that V2 = U1. 


There are no conflicts in the sequence. There are no violations of the definition of BEFORE. 


5.6.2.5 Litmus Test 5 (Sequence Okay) 


Initially, locations x and y contain 1: 


Pi Pj 
[U1]Pi:W<8>(x,2) [V1]Pj:R<8>(y,2) 
[V2]Pj}:MB 
[U2]Pi:W<8>(y,2) [V3]Pj:R<8>(x,1) 
Analysis: 


<l> V1 reading 2 implies U2 <= V1, by storage and visibility. 
<2> VI&=V2<— V3, by processor issue constraints. 


<3> V3 reading 1 implies V3 — U1, by storage and visibility. 


There is U2 = V1 = V2 < V3 < UI. There are no conflicts in this sequence. There are no 
violations of the definition of BEFORE. 


5.6.2.6 Litmus Test 6 (Sequence Okay) 


Initially, locations x and y contain 1: 


Pi Pj 

[U1]Pi:W<8>(x,2) [V1 ]Pj:R<8>(y,2) 

[U2]Pi:MB 

[U3]Pi:W<8>(y,2) [V2]Pj:R<8>(x,1) 
Analysis: 


<l> Ul <&—U2 <—U3, by processor issue constraints. 
<2> V1 reading 2 implies U3 < V1, by storage and visibility. 
<3> V2 reading 1 implies V2 <= U1, by storage and visibility. 
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There is V2 = U1] = U2 — U3 © VI. There are no conflicts in this sequence. There are no 
violations of the definition of BEFORE. 


In litmus tests 4, 5, and 6, writes to two different locations x and y are observed (by another 
processor) to occur in the opposite order than that in which they were performed. An update to 
y propagates quickly to Pj, but the update to x is delayed, and Pi and Pj do not both have MBs. 


5.6.2.7 Litmus Test 7 (Impossible Sequence) 


Initially, locations x and y contain 1: 


Pi Pj 
[U1]Pi:W<8>(x,2) [V1]Pj:R<8>(y,2) 
[U2]Pi:MB [V2]Pj:MB 
[U3]Pi:W<8>(y,2) [V3]Pj:R<8>(x, 1) 
Analysis: 
<I> V3 reading | implies V3 <= U1, by storage and visibility. 
<2> V1 reading 2 implies U3 — V1, by storage and visibility. 
<3> Ul = U2 — U3, by processor issue constraints. 
<4> V1 < V2 < V3, by processor issue constraints. 
<5> By <2>, <3>, and <4>, U1 = U2 = U3 = V1 = V2 <— V3. 


Both <1> and <5> cannot be true, so if V1 reads 2, then V3 must also read 2. 


If both x and y are in memory-like regions, the sequence remains impossible if U2 is changed 
to a WMB. Similarly, if both x and y are in non-memory-like regions, the sequence remains 
impossible if U2 is changed to a WMB. 


5.6.2.8 Litmus Test 8 (Impossible Sequence) 


Initially, locations x and y contain 1: 


Pi Pj 
[U1 ]Pi:W<8>(x,2) [V1]Pj:W<8>(y,2) 
[U2]Pi:MB [V2]Pj:MB 
(U3 ]Pi:R<8>(y, 1) [V3]Pj:R<8>(x, 1) 
Analysis: 
<I> V3 reading 1 implies V3 — U1, by storage and visibility. 
<2> U3 reading 1 implies U3 = V1, by storage and visibility. 
<3> Ul —U2 < U3, by processor issue constraints. 
<4> V1 <= V2 < V3, by processor issue constraints. 
<S>_ By <2>, <3>, and <4>, U1 = U2 —U3 = V1 =H V2<— V3. 


Both <1> and <5> cannot be true, so if U3 reads 1, then V3 must read 2, and vice versa. 
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5.6.2.9 Litmus Test 9 (Impossible Sequence) 


Initially, location x contains 1: 


Pi Pj 

[U1]Pi:W<8>(x,2) [V1]Pj:W<8>(x,3) 

[U2]Pi:R<8>(x,2) [V2]Pj:R<8>(x,3) 

[U3]Pi:R<8>(x,3) [V3]Pj:R<8>(x,2) 
Analysis: 


<1l> V3 reading 2 implies U1 is the latest write to x visible to V3, therefore V1 << U1. 


<2> U3 reading 3 implies V1 is the latest write to x visible to U3, therefore U1 < V1. 


Both <1> and <2> cannot be true. Time cannot go backwards. If V3 reads 2, then U3 must 
read 2. Alternatively, if U3 reads 3, then V3 must read 3. 


5.6.2.10 Litmus Test 10 (Sequence Okay) 

For an aligned quadword location, x, initially 100000001 ¢: 
Pi Pj 
[U1 ]Pi:W<4>(x,2) [V 1]Pj:W<4>(x+4,2) 
[U2]Pi:R<8>(x,100000002;¢) [V2]Pj:R<8>(x,200000001 1¢) 

Analysis: . 
<l> Since U2 reads 1 from x+4, V1 is not visible to U2. Thus U2 < V1. 
<2> Similarly, V2 < U1. 


<3> UL is visible to U2, but since they are issued by the same processor, it is not neces- 
sarily the case that U1 = U2. 


<4> Similarly, it is not necessarily the case that V1 <= V2. 


There is no ordering cycle, so the sequence is permitted. 


5.6.2.11 Litmus Test 11 (Impossible Sequence) 
For an aligned quadword location, x, initially 100000001 ¢: 


Pi Pj 
[U1]Pi:W<4>(x,2) [V1]Pj:R<8>(x,200000001 16) 
[U2]Pi:MB or WMB 
[U3]Pi:W<4>(x+4,2) 
Analysis: 


<1l> V1 reading 200000001 ,¢ implies U3 = V1 = U1 by storage and visibility. 
<2> Ul <=U2 <U3, by processor issue constraints. 


Both <1> and <2> cannot be true. 
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5.6.3 Implied Barriers 


There are no implied barriers in Alpha. If an implied barrier is needed for functionally correct 
access to shared data, it must be written as an explicit instruction. (Software must explicitly 
include any needed MB, WMB, or CALL_PAL IMB instructions.) 


Alpha transitions such as the following have no built-in implied memory barriers: 
e =Entry to PALcode 
e Sending and receiving interrupts 
¢ Returning from exceptions, interrupts, or machine checks 
¢ Swapping context 
e = Invalidating the Translation Buffer (TB) 


Depending on implementation choices for maintaining cache coherency, some PALcode/cache 
implementations may have an implied CALL_PAL IMB in the I-stream TB fill routine, but 
this is transparent to the non-PALcode programmer. 


5.6.4 Implications for Software 


Software must explicitly include MB, WMB, or CALL_PAL IMB instructions according to 
the following circumstances. 


5.6.4.1 Single Processor Data Stream 


No barriers are ever needed. A read to physical address x will always return the value written 
by the immediately preceding write to x in the processor issue sequence. 


5.6.4.2 Single Processor Instruction Stream 


An I-fetch from virtual or physical address x does not necessarily return the value written by 
the immediately preceding write to x in the issue sequence. To make the I-fetch reliably get 
the newly written instruction, a CALL_PAL IMB is needed between the write and the I-fetch. 


5.6.4.3, Multiprocessor Data Stream (Including Single Processor with DMA I/O) 


Generally, the only way to reliably communicate shared data is to write the shared data on one 
processor or DMA I/O device, execute an MB (or the logical equivalent! if itis a DMA I/O 
device), then write a flag (equivalently, send an interrupt) signaling the other processor that 
the shared data is ready. Each receiving processor must read the new flag (equivalently, 
receive the interrupt), execute an MB, then read or update the shared data. In the special case 


1 In this context, the logical equivalent of an MB for a DMA device is whatever is necessary under the 
applicable I/O subsystem architecture to ensure that preceding writes will be BEFORE (see Section 
5.6.1.2) the subsequent write of a flag or transmission of an interrupt. Not all I/O devices behave 
exactly as required by the Alpha architecture. To interoperate properly with those devices, some spe- 
cial action might be required by the program executing on the CPU. For example, PCI bus devices 
require that after the CPU has received an interrupt, the CPU must read a CSR location on the PCI 
device, execute an MB, then read or update the shared data. From the perspective of the Alpha archi- 
tecture, this CSR read can be regarded as a necessary assist to help the DMA I/O device complete its 
logical equivalent of an MB. 
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in which data is communicated through just one location in memory, memory barriers are not 
necessary. 


Software Note: 


Note that this section does not describe how to reliably communicate data from a_ 
processor to a DMA device. See Section 5.6.4.7. 


Leaving out the first MB removes the assurance that the shared data is written before the flag 
is written. 


Leaving out the second MB removes the assurance that the shared data is read or updated only 
after the flag is seen to change; in this case, an early read could see an old value, and an early 
update could be overwritten. 


This implies that after a DMA I/O device has written some data to memory (such as paging in 
a page from disk), the DMA device must logically execute an MB! before posting a comple- 
tion interrupt, and the interrupt handler software must execute an MB before the data is 
guaranteed to be visible to the interrupted processor. Other processors must also execute MBs 
before they are guaranteed to see the new data. 


An important special case occurs when a write is done (perhaps by an I/O device) to some 
physical page frame, then an MB is executed, and then a previously invalid PTE is changed to 
be a valid mapping of the physical page frame that was just written. In this case, all processors 
that access virtual memory by using the newly valid PTE must guarantee to deliver the newly 
written data after the TB miss, for both I-stream and D-stream accesses. \This can perhaps be 
done in TB-miss PALcode.\ 


5.6.4.4 Multiprocessor Instruction Stream (Including Single Processor with DMA I/O) 


The only way to update the I-stream reliably is to write the shared I-stream on one processor 
or DMA J/O device, then execute a CALL_PAL IMB (or an MB if the processor is not going 
to execute the new I-stream, or the logical equivalent of an MB if it is a DMA I/O device), 
then write a flag (equivalently, send an interrupt) signaling the other processor that the shared 
I-stream is ready. Each receiving processor must read the new flag (equivalently, receive the 
interrupt), execute a CALL_PAL IMB, then fetch the shared I-stream. 


Software Note: 


Note that this section does not describe how to reliably communicate I-stream from a 
processor to a DMA device. See Section 5.6.4.7. 


Leaving out the first CALL_PAL IMB (or MB) removes the assurance that the shared 
I-stream is written before the flag. 


Leaving out the second CALL_PAL IMB removes the assurance that the shared I-stream is 
read only after the flag is seen to change; in this case, an early read could see an old value. 


1 See Footnote 1 on page 5-22. 
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This implies that after a DMA I/O device has written some I-stream to memory (such as pag- 
ing in a page from disk), the DMA device must logically execute an MB! before posting a 
completion interrupt, and the interrupt handler software must execute a CALL_PAL IMB 
before the I-stream is guaranteed to be visible to the interrupted processor. Other processors 
must also execute CALL_PAL IMB instructions before they are guaranteed to see the new 
I-stream. ; 


An important special case occurs under the following circumstances: 
1. A write (perhaps by an I/O device) is done to some physical page frame. 
2. ACALL_PAL IMB (or MB) is executed. 


3. A previously invalid PTE is changed to be a valid mapping of the physical page frame 
that was written in step 1. 


In this case, all processors that access virtual memory by using the newly valid PTE must guar- 
antee to deliver the newly written I-stream after the TB miss. 


5.6.4.5 Multiprocessor Context Switch 


If a process migrates from executing on one processor to executing on another, the context 
switch operating system code must include a number of barriers. 


A process migrates by having its context stored into memory, then eventually having that con- 
text reloaded on another processor. In between, some shared mechanism must be used to 
communicate that the context saved in memory by the first processor is available to the second 
processor. This could be done by using an interrupt, by using a flag bit associated with the 
saved context, or by using a shared-memory multiprocessor data structure, as follows: 


First Processor Second Processor 


Save state of current process. 


MB [1] 

Pass ownership of process con- 

text data structure memory. => Pick up ownership of process context data 
structure memory. 
MB [2] 


Restore state of new process context data struc- 
ture memory. 


Make I-stream coherent [3]. 
Make TB coherent [4]. 


Execute code for new process that accesses 
memory that is not common to all processes. 


1 See Footnote 1 on page 5-22. 
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MB [1] ensures that the writes done to save the state of the current process happen before 
the ownership is passed. 


MB [2] ensures that the reads done to load the state of the new process happen after the 
ownership is picked up and hence are reliably the values written by the processor saving 
the old state. Leaving this MB out makes the code fail if an old value of the context 
remains in the second processor’s cache and invalidates from the writes done on the first 
processor are not delivered soon enough. 


The TB on the second processor must be made coherent with any write to the page tables 
that may have occurred on the first processor just before the save of the process state. This 
must be done with a series of TB invalidate instructions to remove any nonglobal page 
mapping for this process, or by assigning an ASN that is unused on the second processor 
to the process. One of these actions must occur sometime before starting execution of the 
code for the new process that accesses memory (instruction or data) that is not common to 
all processes. A common method is to assign a new ASN after gaining ownership of the 
new process and before loading its context, which includes its ASN. 


The D-cache on the second processor must be made coherent with any write to the 
D-stream that may have occurred on the first processor just before the save of process 
state. This is ensured by MB [2] and does not require any additional instructions. 


The I-cache on the second processor must be made coherent with any write to the I-stream 
that may have occurred on the first processor just before the save of process state. This 
can be done with a CALL_PAL IMB sometime before the execution of any code that is 
not common to all processes, More commonly, this can be done by forcing a TB miss (via 
the new ASN or via TB invalidate instructions) and using the TB-fill rule (see Section 
5.6.4.3). This latter approach does not require any additional instruction. 


Combining all these considerations gives the following, where, on a single processor, there is 
no need for the barriers: 
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First Processor 


Pick up ownership of process con- 
text data structure memory. 


MB 


Assign new ASN or invalidate 
TBs. 


Save state of current process. 
Restore state of new process. 
MB 


Pass ownership of process context 
data structure memory. 


5.6.4.6 Multiprocessor Send/Receive Interrupt 


Second Processor 


Pickup ownership of new process context 
data structure memory. 


MB 

Assign new ASN or invalidate TBs. 
Save state of current process. 
Restore state of new process. 

MB 


Pass ownership of old process context data 
structure memory. 


Execute code for new process that accesses 


memory that is not common to all processes. 


If one processor writes some shared data, then sends an interrupt to a second processor, and 
that processor receives the interrupt, then accesses the shared data, the sequence from Section 


5.6.4.3 must be used: 
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First Processor Second Processor 


Write data 

MB 

Send interrupt = Receive interrupt 
MB 


Access data 


Leaving out the MB at the beginning of the interrupt-receipt routine causes the code to fail if 
an old value of the context remains in the second processor’s cache, and invalidates from the 
writes done on the first processor are not delivered soon enough. 


5.6.4.7 Implications for Memory Mapped I/O 


Sections 5.6.4.3 and 5.6.4.4 describe methods for communicating data from a processor or 
DMA I/O device to another processor that work reliably in all Alpha systems. Special consid- 
erations apply to the communication of data or I-stream from a processor to a DMA I/O 
device. These considerations arise from the use of bridges to connect to I/O buses with devices 
that are accessible by memory accesses to non-memory-like regions of physical memory. 


The following communication method works in all Alpha systems.. 


To reliably communicate shared data from a processor to an I/O device: 
1. Write the shared data to a memory-like physical memory region on the processor. 
2. Execute an MB instruction. 


3. Write a flag (equivalently, send an interrupt or write a register location implemented in 
the I/O device). 


The receiving I/O device must: 


1. Read the flag (equivalently, detect the interrupt or detect the write to the register loca- 
tion implemented in the I/O device). 


2. Execute the equivalent of an MB! 
3. Read the shared data. 


As shown in Section 5.6.4.3, leaving out the memory barrier removes the assurance that the 
shared data is written before the flag is. Unlike the case in Section 5.6.4.3, writing the shared 
data to a non-memory-like physical memory region removes the assurance that the I/O device 


1 In this context, the logical equivalent of an MB for a DMA device is whatever is necessary under the 
applicable I/O subsystem architecture to ensure that preceding writes will be BEFORE (see Section 
5.6.1.2) the subsequent reads of shared data. Typically, this action is defined to be present between 
every read and write access done by the I/O device, according to the applicable I/O subsystem archi- 
tecture. 
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will detect the writes of the shared data before detecting the flag write, interrupt, or device reg- 
ister write. 


This implies that after a processor has prepared a data buffer to be read from memory by a 
DMA I/O device (such as writing a buffer to disk), the processor must execute an MB before 
starting the I/O. The I/O device, after receiving the start signal, must logically execute an MB 
before reading the data buffer, and the buffer must be located in a memory-like physical mem- 
ory region. 


There are methods of communicating data that may work in some systems but are not guaran- 
teed in all systems. Two notable examples are: 


1. If an Alpha processor writes a location implemented in a component located on an I/O 
bus in the system, then executes a memory barrier, then writes a flag in some memory 
location (in a memory-like or non-memory-like region), a device on the I/O bus may be 
able to detect (via read access) the result of the flag in memory write and the write of 
the location on the I/O bus out of order (that is, in a different order than the order in 
which the Alpha processor wrote those locations). 


2. If an Alpha processor writes a location that is a control register within an I/O device, 
then executes a memory barrier, then writes a location in memory (in a memory-like or 
non-memory-like region), the I/O device may be able to detect (via read access) the 
result of the memory write before receiving and responding to the write of its own con- 
trol register. 


In almost every case, a mechanism that ensures the completion of writes to control register 
locations within I/O devices is provided. The normal and strongly recommended mechanism is 
to read a location after writing it, which guarantees that the write is complete. In any case, all 
systems that use a particular I/O device should provide the same mechanism for that device. 


5.6.4.8 Multiple Processors Writing to a Single I/O Device 


Generally, for multiple processors to cooperate in writing to a single I/O device, the first pro- 
cessor must write to the device, execute an MB, then notify other processors. Another 
processor that intends to write the same I/O device after the first processor must receive the 
notification, execute an MB, and then write to the I/O device. For example: 


First Processor . ~ Second Processor 

Write CSR_A 

MB 

Write flag (in memory) = Read flag (in memory) 
MB 


Write CSR_B 
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The MB on the first processor guarantees that the write to CSR_A precedes the write to flag in 
memory, as perceived on other processors. (The MB does not guarantee that the write to 
CSR_A has completed. See Section 5.6.4.7 for a discussion of how a processor can guarantee 
that a write to an I/O device has completed at that device.) The MB on the second processor 
guarantees that the write to CSR_B will reach the I/O device after the write to CSR_A. 


5.6.5 Implications for Hardware 


The coherency point for physical address x is the place in the memory subsystem at which 
accesses to x are ordered. It may be at a main memory board, or at a cache containing x exclu- 
sively, or at the point of winning a common bus arbitration. 


The coherency point for x may move with time, as exclusive access to x migrates between 
main memory and various caches. 


MB and CALL_PAL IMB force all preceding writes to at least reach their respective coher- 
ency points. This does not mean that main-memory writes have been done, just that the order 
of the eventual writes is committed. For example, on the XMI with retry, this means getting 
the writes acknowledged as received with good parity at the inputs to memory board queues; 
the actual RAM write happens later. 


MB arid CALL_PAL IMB also force all queued cache invalidates to be delivered to the local 
caches before starting any subsequent reads (that may otherwise cache hit on stale data) or 
writes (that may otherwise write the cache, only to have the write effectively overwritten by a 
late-delivered invalidate). 


WMB ensures that the final order of writes to memory-like regions is committed and that the 
final order of writes to non-memory-like regions is committed. This does not imply that the 
final order of writes to memory-like regions relative to writes to non-memory-like regions is 
committed. It also prevents writes that precede the WMB from merging with writes that fol- 
low the WMB. For example, an implementation with a write buffer might implement WMB 
by closing all valid write buffer entries from further merging and then drain the write buffer 
entries in order. . 


Implementations may allow reads of x to hit (by physical address) on pending writes in a write 
buffer, even before the writes to x reach the coherency point for x. If this is done, it is still true 
that no earlier value of x may subsequently be delivered to the processor that took the hit on 
the write buffer value. 


Virtual data caches are allowed to deliver data before doing address translation, but only if 
there cannot be a pending write under a synonym virtual address. Lack of a write-buffer match 
on untranslated address bits is sufficient to guarantee this. 


Virtual data caches must invalidate or otherwise become coherent with the new value when- 
ever a PALcode routine is executed that affects the validity, fault behavior, protection 
behavior, or virtual-to-physical mapping specified for one or more pages. Becoming coherent 
can be delayed until the next subsequent MB instruction or TB fill (using the new mapping) if 
the implementation of the PALcode routine always forces a subsequent TB fill. 
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5.7 Arithmetic Traps 


Alpha implementations are allowed to execute multiple instructions concurrently and to for- 
ward results from one instruction to another. Thus, when an arithmetic trap is detected, the PC 
may have advanced an arbitrarily large number of instructions past the instruction T (calculat- 
ing result R) whose execution triggered the trap. 


When the trap is detected, any or all of these subsequent instructions may run to completion 
before the trap is actually taken. The set of instructions subsequent to T that complete before 
the trap is taken are collectively called the trap shadow of T. The PC pushed on the stack when 
the trap is taken is the PC of the first instruction past the trap shadow. 


The instructions in the trap shadow of T may use the UNPREDICTABLE result R of T, they 
may generate additional traps, and they may completely change the PC (branches, JSR). 


Thus, by the time a trap is taken, the PC pushed on the stack may bear no useful relationship 
to the PC of the trigger instruction T, and the state visible to the programmer may have been 
updated using the UNPREDICTABLE result R. If an instruction in the trap shadow of T uses 
R to calculate a subsequent register value, that register value is UNPREDICTABLE, even 
though there may be no trap associated with the subsequent calculation. Similarly: 


e If an instruction in the trap shadow of T stores R or any subsequent UNPREDICT- 
ABLE result, the stored value is UNPREDICTABLE. 


e If an instruction in the trap shadow of T uses R or any subsequent UNPREDICTABLE 
result as the basis of a conditional or calculated branch, the branch target is UNPRE- 
DICTABLE. 


e If an instruction in the trap shadow of T uses R or any subsequent UNPREDICTABLE 
result as the basis of an address calculation, the memory address actually accessed is 
UNPREDICTABLE. 


Software can follow the rules in Section 4.7.7.3 to reliably bound how far the PC may advance 
before taking a trap, how far an UNPREDICTABLE result may propagate or continue from a 
trap by supplying a well-defined result R within an arithmetic trap handler. Arithmetic instruc- 
tions that do not use the /S exception completion qualifier can reliably produce that behavior 
by inserting TRAPB instructions at appropriate points. 
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5.8 \Revision History 


Revision 7.0, November 10, 1997 


1. 


i ol DM ce oS 


Alpha AXP —> Alpha 

Added ECO 98, the new memory model 

Added ECO 97, WMB changes 

Added ECO 102, LDx_L and STx_C semantic changes 

Added ECO 106, A semantic change to the Alpha memory model 


Removed section 5.7, Arithmetic Traps, as obsolete and covered in Chapter 4 


Revision 6.0, December 1994 


1, 


at Ov ge Ie 


"undefined" in Arithmetic Traps —-> UNPREDICTABLE as undefined is a magic 
word. 


"Reference" —-> "access" as they are synonymous 

"do an MB" ——> "execute an MB" 

Global addition of ‘CALL_PAL’ to all IMB (CALL_PAL IMB) 
Added ECO 68 

Alpha —> Alpha AXP 


Physical memory space—> physical address space 


Revision 5.0, May 12, 1992 


1. 


2 
3; 
4 


Changed DRAINT to TRAPB 
Converted to SDML 
Generalized OS specific PALcode instructions 


Generalized OS specific multiprocessor context switching 


Revision 4.0, March 29, 1991 


1. 


Z 
> 
4 


“I 


Added Litmus Test 9 
Explain what an excess data transfer is 
Correct typing error in code sequence example for modification of atomic data structure 


Add MB instructions to second illustrative example that specnies use of MB for multi- 
ple processor context switch 


Note that MB and CALL_PAL IMB do not guarantee timeliness 
Removed reference to byte when specifying granularity of data transfer widths 


Made minor changes to correct use of capitals and remove repeated words in the Litmus 
Test section 
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Revision 3.0, March 2, 1990 
1. Complete rewrite of data sharing 


2. Complete rewrite of read/write ordering 


Revision 2.0, October 4, 1989 
1. Total rewrite 
2. Memory, buffer, I/O spaces removed; Physical memory regions added 
3. SWP, FREEZE, and THAW removed; LDQ/L and STQ/C added 
4. FAS removed; MB and NUDGE added 
5. DRAIN and WAIT removed; DRAINT and /Semi-precise added 


Revision 1.0, May 23, 1989 


1. First Review Distribution\ 
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Chapter 6 


Common PALcode Architecture (I) 


6.1 PALcode 


In a family of machines, both users and operating system developers require functions to be 
implemented consistently. When functions conform to a common interface, the code that uses 
those functions can be used on several different implementations without modification. 


These functions range from the binary encoding of the instruction and data to the exception 
mechanisms and synchronization primitives. Some of these functions can be implemented cost 
effectively in hardware, but others are impractical to implement directly in hardware. These 
functions include low-level hardware support functions such as Translation Buffer miss fill 
routines, interrupt acknowledge, and vector dispatch. They also include support for privileged 
and atomic operations that require long instruction sequences. 


In the VAX, these functions are generally provided by microcode. This is not seen as a prob- 
lem because the VAX architecture lends itself to a microcoded implementation. - 


One of the goals of Alpha architecture is to implement functions consistently without micro- 
code. However, it is still desirable to provide an architected interface to these functions that 
will be consistent across the entire family of machines. The Privileged Architecture Library 
(PALcode) provides a mechanism to implement these functions without microcode. 


Note: 


\The hardware development groups provide and maintain the standard PALcode for a 
given implementation. The PALcode may be in ROM or loaded into RAM from a console 
load device. Many of the same trade-offs exist for PALcode that exist for microcode 
around patching, loading, and booting. Also, operating systems can provide their own 
PALcode rather than use the version provided by the hardware group.\ 


6.2 PALcode Instructions and Functions 
PALcode is used to implement the following functions: 


Instructions that require complex sequencing as an atomic operation 
Instructions that require VAX style interlocked memory access 

Privileged instructions 

Memory management control, including translation buffer (TB) management 
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Context swapping 

Interrupt and exception dispatching 

Power-up initialization and booting 

Console functions 

Emulation of instructions with no hardware support 


The Alpha architecture lets these functions be implemented in standard machine code that is 
resident in main memory. PALcode is written in standard machine code with some implemen- 
tation-specific extensions to provide access to low-level hardware. This lets an Alpha 
implementation make various design trade-offs based on the hardware technology being used 
to implement the machine. The PALcode can abstract these differences and make them invisi- 
ble to system software. 


For example, ina MOS VLSI implementation, a small (32-entry) fully associative TB can be 
the right match to the media, given that chip area is a costly resource. In an ECL version, a 
large (1024 entry) direct-mapped TB can be used because it will use RAM chips and does not 
have fast associative memories available. This difference would be handled by implementa- 
tion-specific versions of the PALcode on the two systems, both versions providing transparent 
TB miss service routines. The operating system code would not need to know there were any 
differences. 


An Alpha Privileged Architecture Library (PALcode) of routines and environments is supplied 
by DIGITAL. Other systems may use a library supplied by DIGITAL or architect and imple- 
ment a different library of routines. Alpha systems are required to support the replacement of 
PALcode defined by DIGITAL with an operating system-specific version. 


Note: 


\The register conventions used are based on the Alpha calling standard Version 1.0. The 
PALcode library will track the Alpha calling standard changes as long as that is practical.\ 


6.3 PALcode Environment 


The PALcode environment differs from the normal environment in the following ways: 
¢ Complete control of the machine state. | 
e Interrupts are disabled. 
¢ Implementation-specific hardware functions are enabled, as described below. 


¢ [-stream memory management traps are prevented (by disabling I-stream mapping, 
mapping PALcode with a permanent TB entry, or by other mechanisms). 


Complete control of the machine state allows all functions of the machine to be controlled. 

Disabling interrupts allows the system to provide multi-instruction sequences as atomic opera- 

tions. Enabling implementation-specific hardware functions allows access to low-level system 
aeAswsrnee nN 


hardware. Preventing I-stream memory management traps allows PALcode to implement 
memory management functions such as translation buffer fill. 
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6.4 Special Functions Required for PALcode 


PALcode uses the Alpha instruction set for most of its operations. A small number of addi- 
tional functions are needed to implement the PALcode. Five opcodes are reserved to 
implement PALcode functions: PAL19, PALIB, PALID, PAL1E, and PALIF. These instruc- 
tions produce an trap if executed outside the PALcode environment. 


PALcode needs a mechanism to save the current state of the machine and dispatch into 
PALcode. 


PALcode needs a set of instructions to access hardware control registers. 


PALcode needs a hardware mechanism to transition the machine from the PALcode 
environment to the non-PALcode environment. This mechanism loads the PC, enables 
interrupts, enables mapping, and disables PALcode privileges. 


An Alpha implementation may also choose to provide additional functions to simplify or 
improve performance of some PALcode functions. The following are some examples: 


An Alpha implementation may include a read/write virtual function that allows PAL- 
code to perform mapped memory accesses using the mapping hardware rather than pro- 
viding the virtual-to-physical translation in PALcode routines. PALcode may provide a 
special function to do physical reads and writes and have the Alpha loads and stores 
continue to operate on virtual address in the PALcode environment. 


An Alpha implementation may include hardware assists for various functions, such as 
saving the virtual address of a reference on a memory management error rather than 
having to generate it by simulating the effective address calculation in PALcode. 


An Alpha implementation may include private registers so it can function without hav- 
ing to save and restore the native general registers. 


6.5 PALcode Effects on System Code 


PALcode will have one effect on system code. Because PALcode may reside in main memory 
and maintain privileged data structures in main memory, the operating system code that allo- 
cates physical memory cannot use all of physical memory. 


The amount of memory PALcode requires is small, so the loss to the system is negligible. 


6.6 PALcode Replacement 


Alpha systems are required to support the replacement of PALcode supplied by DIGITAL 
with an operating system-specific version. The following functions must be implemented in 
PALcode, not directly in hardware, to facilitate replacement with different versions. 


Translation Buffer fill. Different operating systems will want to replace the Translation 
Buffer (TB) fill routines. The replacement routines will use different data structures. 
Page tables will not be present in these systems. Therefore, no portion of the TB fill 
flow that would change with a change in page tables may be placed in hardware, unless 
it is placed in a manner that can be overridden by PALcode. 
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e Process structure. Different operating systems might want to replace the process con- 
text switch routines. The replacement routines will use different data structures. The 
HWPCB or PCB will not be present in these systems. Therefore, no portion of the con- 
text switching flows that would change with a change in process structure may be 
placed in hardware. 


PALcode can be viewed as consisting of the following somewhat intertwined components: 

¢ Chip/architecture component 

e Hardware platform component 

¢ Operating system component 
PALcode should be written modularly to facilitate the easy replacement or conditional build- 
ing of each component. Such a practice simplifies the integration of CPU hardware, system 
platform hardware, console firmware, operating system software, and compilers. 
PALcode subsections that are commonly subject to modification include: 

e = Translation Buffer fill 

e Process structure and context switch 

e Interrupt and exception frame format and routine dispatch 

¢ = Privileged PALcode instructions 

e Transitions to and from console I/O mode 


¢ Power-up reset 


6.7 Required PALcode Instructions 


_ The PALcode instructions listed in Table 6—1 and Appendix C must be recognized by mne- 
monic and opcode in all operating system implementations, but the effect of each instruction is 
dependent on the implementation. DIGITAL defines the operation of these PALcode instruc- 
tions for operating system implementations supplied by DIGITAL. 


Table 6-1: PALcode Instructions that Require Recognition 


Mnemonic Name 

BPT Breakpoint trap 
BUGCHK Bugcheck trap 
CSERVE Console service 
GENTRAP Generate trap 


RDUNIQUE Read unique value 


attr 


SWPPAL Swap PALcode 


WRUNIQUE Write unique value 
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The PALcode instructions listed in Table 6-2 and described in the following sections must be 
supported by all Alpha implementations: 


Table 6-2: Required PALcode Instructions 


Mnemonic Type 


DRAINA Privileged 
HALT Privileged 
IMB Unprivileged 
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Operation 
Drain aborts 
Halt processor 


I-stream memory barrier 


Common PALcode Architecture (I) 6-5 


6.7.1 Drain Aborts 


Format: 


CALL_ PAL DRAINA !PALcode format 


Operation: 


IF PS<literal>(<)CM> NE 0 THEN 
{privileged instruction exception} 


{Stall instruction issuing until all prior 
instructions are guaranteed to complete 
without incurring aborts. } 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL DRAINA Drain Aborts 


Description: 


If aborts are deliberately generated and handled (such as nonexistent memory aborts while siz- 
ing memory or searching for I/O devices), the DRAINA instruction forces any outstanding 
aborts to be taken before continuing. 


Aborts are necessarily implementation dependent. DRAINA stalls instruction issue at least 
until all previously issued instructions have completed and any associated aborts have been 
signaled, as follows: 


e For operate instructions, this usually means stalling until the result register has been 
written. 


e For branch instructions, this usually means stalling until the result register and PC have 
been written. 


¢ For load instructions, this usually means stalling until the result register has been writ- 
ten. 


e For store instructions, this usually means stalling until at least the first level in a poten- 
tially multilevel memory hierarchy has been written. 


For load instructions, DRAINA does not necessarily guarantee that the unaccessed portions of 
a cache block have been transferred error free before continuing. 


For store instructions, DRAINA does not necessarily guarantee that the ultimate target loca- 
tion of the store has received error-free data before continuing. An implementation-specific 
technique must be used to guarantee the ultimate completion of a write in implementations 
that have multilevel memory hierarchies or store-and-forward bus adapters. 
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6.7.2 Halt 


Format: 
CALL PAL HALT !'PALcode format 


Operation: 


IF PS<literal>(<)CM> NE 0 THEN 
{privileged instruction exception} 


CASE {halt action} OF 
! Operating System or Platform dependent choice 


halt: {halt} 
restart/boot/halt: {restart/boot/halt} 
boot/halt: {boot /halt} 
debugger/halt: {debugger/halt} 
restart/halt: {restart/halt} 
ENDCASE 
Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL HALT Halt Processor 


Description: 


The HALT instruction stops normal instruction processing and initiates some other operating 
system or platform-specific behavior, depending on the HALT action setting. The choice of 
behavior typically includes the initiation of a restart sequence, a system bootstrap, or entry 
into console mode. See Console Interface (IID), Chapter 3. 
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6.7.3 Instruction Memory Barrier 


Format: 
CALL_PAL IMB !PALcode format 


Operation: 


{Make instruction stream coherent with data stream} 


Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL IMB — I-stream Memory Barrier 


Description: 


An IMB instruction must be executed after software or I/O devices write into the instruction 
stream or modify the instruction stream virtual address mapping, and before the new value is 
fetched as an instruction. An implementation may contain an instruction cache that does not 
track either processor or I/O writes into the instruction stream. The instruction cache and mem- 
ory are made coherent by an IMB instruction. 


If the instruction stream is modified and an IMB is not executed before fetching an instruction 
from the modified location, it is UNPREDICTABLE whether the old or new value is fetched. 


Software Note: 


In a multiprocessor environment, executing an IMB on_ one processor does not affect 
instruction caches on other processors. Thus, a single IMB on one processor is 
insufficient to guarantee that all processors see a modification of the instruction stream. 


The cache coherency and sharing rules are described in Console Interface (III), Chapter 2. 
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6.8 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 


Revision 6.0, December, 1994 
1. Added ECO 72, HALT update 
2. Added ECO 73, IMB update 
3. Alpha —> Alpha AXP 
4. Change PALRESO..PALRES4 —> PAL19..PALIF 
5. Add ECO 39 information: CSERVE and SWPPAL to table 6-1 


Revision 5.0, May 12, 1992 
1. Added list of recognition-required PALcode instructions 
2. Added DRAINA to list of required PALcode instructions 
3. Changed privileges enabled to complete control of the machine state 
4. PALcode override for TB fill routines 
5. Added HALT and IMB PALcode instructions 


Revision 4.1, August 12, 1991 
1. Created the chapter from Sections 1.1 through 1.6 of the V4.n SRM\ 
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Chapter 7 


Console Subsystem Overview (I) 


On an Alpha system, underlying control of the system platform hardware is provided by a con- 
sole subsystem. The console subsystem: 


Initializes, tests, and prepares the system platform hardware for Alpha system software. 
Bootstraps (loads into memory and starts the execution of) system software. 


Controls and monitors the state and state transitions of each processor in a multiproces- 
sor system. 


Provides services to system software that simplify system software control of and 
access to platform hardware. 


Provides a means for a console operator to monitor and control the system. 


The console subsystem interacts with system platform hardware to accomplish the first three 
tasks. The actual mechanisms of these interactions are specific to the platform hardware; how- 
ever, the net effects are common to all systems. . 


The console subsystem interacts with system software once control of the system platform 
hardware has been transferred to that software. 


The console subsystem interacts with the console operator through a virtual display device or 
console terminal. The console operator may be a person or a management application. 


DIGITAL Restricted Distribution 


Console Subsystem Overview (I) 7-1 


7.1 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 


Revision 6.0, December, 1994 
1. Alpha —-> Alpha AXP 


Revision 5.0, May 12, 1992 
1. Created/copied from Alpha Architecture Handbook\ 
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7-2 Common Architecture (I) 


Chapter 8 


Input/Output Overview (1) 


Conceptually, Alpha systems can consist of processors, memory, a processor-memory inter- 
connect (PMI), I/O buses, bridges, and I/O devices. 


Figure 8-1 shows the Alpha system overview. 


Figure 8-1: Alpha System Overview 


1/O Device 


Processor-Memory Interconnect 


1/O Bus 
(/O Device /O Device 


As shown in Figure 8-1, processors, memory, and possibly I/O devices, are connected bya 
PMI. 










A bridge connects an I/O bus to the system, either directly to the PMI or through another I/O 
bus. The J/O bus address space is available to the processor either directly or indirectly. Indi- 
rect access is provided through either an I/O mailbox or an I/O mapping mechanism. The I/O 
mapping mechanism includes provisions for mapping between PMI and I/O bus addresses and 
access to I/O bus operations. 
Alpha I/O operations can include: 

e Accesses between the processor and an I/O device across the PMI 

e Accesses between the processor and an I/O device across an I/O bus 

e DMA accesses — I/O devices initiating reads and writes to memory 

¢ Processor interrupts requested by devices 


¢ Bus-specific I/O accesses 


e \Targettable interrupts\ 
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8.1 \Revision History 


Revision 7.0, November 10, 1997 


1, 


Alpha AXP —> Alpha 


Revision 6.0, December, 1994 


1. 


Shortened to commonality 


Revision 5.0 May 12, 1992 


1. 
Zi 


Oe Sy I se 


Changed ‘widget’ to ‘device’ 


Split chapter such that Sections 1.1 through the text part of 1.6.3 are now external 
Chapter 8 of the Common Section, Table 1-3 and all text/tables through 1.6.3.2 (Future- 
bus+...) are placed in Appendix D, and 1.7 (Implementation Considerations) to end of 
chapter are internal (backslash) Chapter 8 of the Common Section 


Changed hex IPLs to decimal 

Made specified internal references external 

Added ECO #22 

Converted to SDML 

Made all ‘unpredictable’ to ‘UNPREDICTABLE’ 

Changed SLL to SRL under ‘check error:' in remote read psuedocode 


Removed all] revision history prior to Rev 4.0, 29 March 1991 


Revision 4.1, August 12, 1991 


I. 


Renumbered Chapter to #11 with inclusion of Console ECO #15 


Revision 4.0, March 29, 1991 


1. 


Inclusion in REV 4.0 of the SRM numbering to assume SRM version values\ 
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8-2 Common Architecture (1) 


Common Architecture Index 


A 


Aborts, forcing, 6-6 
ACCESS(x,y) operator, 3-7 


Add instructions 
add longword, 4-25 
add quadword, 4-27 
add scaled longword, 4—26 
add scaled quadword, 4—28 
See also Floating-point operate 


ADDF instruction, 4-110 
ADDG instruction, 4-110 
ADDL instruction, 4-25 
ADDQ instruction, 4-27 
Address space match (ASM) 
virtual cache coherency, 5—4 
Address space number (ASN) register 
virtual cache coherency, 5-4 
ADDS instruction, 4-111 
ADDT instruction, 4-111 
AFTER, defined for memory access, 5-12 
ALIGNED data objects, 1-8 


Alignment 
atomic byte, 5-3 
atomic longword, 5-2 
atomic quadword, 5-2 
D_floating, 2-6 
F_floating, 2-4 
G_floating, 2-5 
longword, 2-2 
longword integer, 2—12 
quadword, 2-3 
quadword integer, 2-12 
S_floating, 2-8 
T_floating, 2-9 
X_floating, 2-10 
Alpha architecture 
addressing, 2-1 
overview, 1-1 
porting operating systems to, 1-1 
programming implications, 5-1 
registers, 3-1 
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security, 1-7 
See also Conventions 


Alpha privileged architecture library. See PALcode 
AMASK (Architecture mask) instruction, 4-134 
AND instruction, 4-42 

AND operator, 3-7 

Architecture extensions, AMASK with, 4-134 
ARITH_RIGHT_SHIFT(x,y) operator, 3-7 


Arithmetic instructions, 4-24 
See also specific arithmetic instructions 
Arithmetic left shift instruction, 4-41 


Arithmetic traps 
denormal operand exception disabling, 4—81 
disabling, 4-78 
division by zero, 4-77, 4-81 . 
division by zero, disabling, 4-81 
dynamic rounding mode, 4—80 
inexact result, 4-78, 4-81 
inexact result, disabling, 4-80 
integer overflow, 4—78, 4-81 
invalid operation, 4-76, 4-81 
invalid operation, disabling, 4-81 
overflow, 4-77, 4-81 
overflow, disabling, 4-81 
programming implications for, 5—30 
TRAPB instruction with, 4-146 
underflow, 4-78, 4-81 
underflow to zero, disabling, 4—80 
underflow, disabling, 4—-80 


Atomic access, 5-3 


Atomic operations 


accessing longword datum, 5-2 

accessing quadword datum, 5-2 

updating shared data structures, 5—7 

using load locked and store conditional, 5—7 


BEFORE, defined for memory access, 5-12 
BEQ instruction, 4-20 
BGE instruction, 4—20 
BGT instruction, 4-20 
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BIC instruction, 4—42 


. Big-endian addressing, 2-13 
byte operation examples, 4-54 
extract byte with, 4—51 
insert byte with, 4-55 
load F_floating with, 4-91 
load long/quad locked with, 4—9 
load S_floating with, 4-93 
mask byte with, 4—57 
store byte/word with, 4—15 
store F_floating with, 4—95 
store long/quad conditional with, 4-12 
store long/quad with, 4-15 
store S_floating with, 4-97 


Big-endian data types, X_floating, 2-10 

BIS instruction, 4—42 

BLBC instruction, 4-20 

BLBS instruction, 4—20 

BLE instruction, 4—20 

BLT instruction, 4-20 

BNE instruction, 4—20 

Boolean instructions, 4-41 
logical functions, 4-42 

BPT (PALcode) instruction 
required recognition of, 6—4 

bpt (PALcode) instruction 
required recognition of, 6—4 

BR instruction, 4—21 


Branch instructions, 4-18 


backward conditional, 4—20 
conditional branch, 4—20 
floating-point, summarized, 4—99 
format of, 3-12 

forward conditional, 4—20 
unconditional branch, 4-21 

See also Control instructions 


Branch prediction model, 4—18 
Branch prediction stack,with BSR instruction, 4-21 
BSR instruction, 4—21 
BUGCHK (PALcode) instruction 
required recognition of, 6-4 
bugchk (PALcode) instruction 
required recognition of, 6-4 
Byte data type, 2-1 
atomic access of, 5-3 
Byte manipulation, 1-2 
Byte manipulation instructions, 4-47 


Wyre 


BYTE_ZAP(x,y) operator, 3-7 


Cc 


/C opcode qualifier 
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IEEE floating-point, 4-67 
VAX floating-point, 4-67 
C opcode qualifier, 4-67 
Cache coherency 
barrier instructions for, 5—25 
defined, 5-2 
in multiprocessor environment, 5-6 
Caches 


MB and IMB instructions with, 5-25 
requirements for, 5-5 
with powerfail/recovery, 5-5 


CALL_PAL (call privileged architecture library) 
instruction, 4—136 i 
CASE operator, 3-8 
Causal loops, 5-15 
CFLUSH (PALcode) instruction 
ECB compared with, 4-139 
Changed datum, 5-6 


‘CMOVEQ instruction, 4-43 


CMOVGE instruction, 4—43 
CMOVGT instruction, 4—43 
CMOVLBC instruction, 4—43 
CMOVLE instruction, 4-43 
CMOVLT instruction, 4-43 
CMOVNE instruction, 4-43 
CMPBGE instruction, 4-49 
CMPEQ instruction, 4-29 
CMPGLE instruction, 4-112 
CMPGLT instruction, 4-112 
CMPLE instruction, 4—29 
CMPLT instruction, 4-29 
CMPTEQ instruction, 4-113 
CMPTLE instruction, 4-113 
CMPTLT instruction, 4-113 
CMPTUN instruction, 4-113 
CMPULE instruction, 4-30 
CMPULT instruction, 4-30 


Code scheduling . 
IMPLVER instruction with, 4-142 
CODEC, 4-154 


Coherency 
cache, 5—2 
memory, 5-1 
Compare instructions 
compare integer signed, 4—29 
compare integer unsigned, 4-30 
See also Floating-point operate 
Conditional move instructions, 4—43 
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See also Floating-point operate 
Console overview, 7-1 
Control instructions, 4-18 


Conventions 
code examples, 1-9 
extents, 1-8 
figures, 1-9 
instruction format, 3-10 
- notation, 3-10 
numbering, 1-7 
ranges, 1-8 

Count instructions 


Count leading zero, 4-31 
Count population, 4-32 
Count trailing zero, 4-33 


CPYS instruction, 4-105 
CPYSE instruction, 4-105 
CPYSN instruction, 4-105 


CSERVE (PALcode) instruction 
required recognition of, 6—4 
cserve (PALcode) instruction 
required recognition of, 6—4 
CTLZ instruction, 4-31 


CTPOP instruction, 4-32 
CTTZ instruction, 4-33 
CVTDG instruction, 4-116 
CVTGD instruction, 4-116 
CVTGF instruction, 4-116 
CVTGOQ instruction, 4-114 
CVTLQ instruction, 4-106 
CVTQEF instruction, 4-115 
CVTQG instruction, 4-115 
CVTQL instruction, 4-106 
CVTOQS instruction, 4-118 
CVTOQT instruction, 4-118 
CVTST instruction, 4-120 
CVTTQ instruction, 4-117 
CVTTS instruction, 4-119 


D 


/D opcode qualifier 
FPCR (floating-point control register), 4-79 
~ TEEE floating-point, 4—67 
D_floating data type, 2-5 
alignment of, 2-6 
mapping, 2-6 
restricted, 2-6 
Data caches 
ECB instruction with, 4-137 
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WH64 instruction with, 4-147 
Data format, overview, 1-3 
Data sharing (multiprocessor) 
synchonization requirement, 5-6 
Data structures, shared, 5-6 
Data types 
byte, 2-1 
IEEE floating-point, 2-6 
longword, 2-2 
longword integer, 2-11 
quadword, 2-2 
quadword integer, 2-12 
unsupported in hardware, 2-12 
VAX floating-point, 2-3 
word, 2-1 
Denormal, 4-64 
Denormal operand exception disable, 4-81 
Denormal operands to zero, 4-81 
Depends order (DP), 5-15 
Dirty zero, 4-64 
DIV operator, 3-8 
DIVF instruction, 4-121 
DIVG instruction, 4-121 
DIVS instruction, 4-122 
DIVT instruction, 4-122 
DNOD bit. See Denormal operand exception disable 
DNZ. See Denormal operands to zero 
DP. See Depends order 
DRAINA (PALcode) instruction, required, 6—5 
draina (PALcode) instruction, required, 6-5 
DYN bit. See Arithmetic traps, dynamic rounding 
mode 
DZE bit. See Arithmetic traps, division by zero 


DZED bit. See Trap disable bits, division by zero 


E 


ECB (Evict data cache block) instruction, 4—137 
CFLUSH (PALcode) instruction with, 4-139 

EQV instruction, 4-42 

EXCB (exception barrier) instruction, 4-139 
with FPCR, 4-84 

Exception handlers 
TRAPB instruction with, 4-146 

Exceptions 


F31 with, 3-2 
R31 with, 3-1 


EXTBL instruction, 4-51 
EXTLH instruction, 4-51 
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EXTLL instruction, 4—51 
EXTQH instruction, 4—51 
EXTQL instruction, 4—51 
Extract byte instructions, 4-51 
EXTWH instruction, 4-51 
EXTWL instruction, 4—51 


FE 


F_floating datatype, 2-3 
alignment of, 2—4 
compared to IEEE S_floating, 2-8 
MAX/MIN, 4-65 


FBEQ instruction, 4—100 

FBGE instruction, 4—100 

FBGT instruction, 4—100 

FBLE instruction, 4-100 

FBLT instruction, 4—100 

FBNE instruction, 4-100 

FCMOVEQ instruction, 4-107 

FCMOVGE instruction, 4-107 

FCMOVGT instruction, 4-107 

FCMOVLE instruction, 4—107 

FCMOVLT instruction, 4-107 

FCMOVNE instruction, 4-107 

FETCH (prefetch data) instruction, 4-140 

FETCH_M (prefetch data, modify intent) instruction, 
4-140 

Finite number, Alpha, contrasted with VAX, 4-63 

Floating-point branch instructions, 4—99 


Floating-point control register (FPCR) 
accessing, 4-82 
at processor initialization, 4—83 
bit descriptions, 4—80 
instructions to read/write, 4-109 
operate instructions that use, 4—102 
saving and restoring, 4-83 
trap disable bits in, 4~78 
Floating-point convert instructions, 3-14 
Fa field requirements, 3-14 
Floating-point format, number representation 
. _ (encodings), 4-65 


Floating-point instructions 
branch, 4—99 


faults, 4-62 


function field format, 4—84 
introduced, 4—62 

memory format, 4-90 
operate, 4—102 

rounding modes, 4-66 
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terminology, 4—63 
trapping modes, 4-69 
traps, 4-62 
Floating-point load instructions, 4—90 
load F_floating, 4-91 
load G_floating, 4-92 
load S_floating, 4-93 
load T_floating, 4—94 
with non-finite values, 4—90 
Floating-point operate instructions, 4—102 

add (IEEE), 4-111 
add (VAX), 4-110 
compare (IEEE), 4-113 
compare (VAX), 4-112 
conditional move, 4—107 
convert IEEE floating to integer, 4—117 
convert integer to IEEE floating, 4-118 
convert integer to integer, 4-106 
convert integer to VAX floating, 4—115 
convert S_floating to T_floating, 4-119 
convert T_floating to S_floating, 4-120 
convert VAX floating to integer, 4-114 
convert VAX floating to VAX floating, 4-116 
copy sign, 4-105 
divide (IEEE), 4—122 
divide (VAX), 4-121 
format of, 3-13 
from integer moves, 4—125 
move from/to FPCR, 4-109 
multiply (EEE), 4-128 

_ multiply (VAX), 4-127 
subtract IEEE), 4—132 
subtract (VAX), 4-131 
to integer moves, 4—123 
unused function codes with, 3-14 


Floating-point registers, 3-2 
Floating-point single-precision operations, 4—62 
Floating-point store instructions, 4—90 

store F_floating, 4-95 

store G_floating, 4-96 

store S_floating, 4-97 


store T_floating, 4—98 
with non-finite values, 4—90 
Floating-point support 
IEEE, 2-6 
IEEE standard 754-1985, 4-88 
instruction overview, 4—62 
longword integer, 2~11 
operate instructions, 4—102 
optional, 4—2 
quadword integer, 2-12 
rounding modes, 4-66 
single-precision operations, . 4-62 
trap modes, 4-69 
VAX, 2-3 
- Floating-point to integer move, 4—123 
Floating-point to integer move instructions, 3-14 


Floating-point trapping modes, 4—69 
See also Arithmetic traps 
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FPCR. See Floating-point control register 
FTOIS instruction, 4-123 
FTOIT instruction, 4-123 


G 


G_floating data type, 2-4 
alignment of, 2-5 
mapping, 2-5 
MAX/MIN, 4-65 
GENTRAP (PALcode) instruction 
required recognition of, 6—4 
gentrap (PALcode) instruction 
required recognition of, 6—4 





H 


HALT (PALcode) instruction, required, 6-7 
halt (PALcode) instruction, required, 6-7 


I/O devices, DMA 


MB and WMB with, 5-22 
reliably communicating with processor, 5-27 
shared memory locations with, 5-11 


I/O interface overview, 8-1 


IEEE floating-point 
format, 2-6 
FPCR (floating-point control register), 4-79 
function field format, 4-85 
NaN, 2-6 
S_floating, 2-7 
T_floating, 2-8 
X_floating, 2-9 
See also Floating-point instructions 
IEEE floating-point instructions 
‘add instructions, 4-111 
compare instructions, 4-113 
convert from integer instructions, 4-118 
convert S_floating to T_floating, 4-119 
convert T_floating to S_floating, 4-120 
convert to integer instructions, 4-117 
divide instructions, 4-122 
from integer moves, 4-125 
multiply instructions, 4-128 
operate instructions, 4-102 
square root instructions, 4-130 
subtract instructions, 4-132 
to register moves, 4—123 


IEEE standard, 4—88 

IGN (ignore), 1-9 

IMB (PALcode) instruction, 5-23 
required, 6-8 
virtual I-cache coherency, 5-5 

imb (PALcode) instruction, required, 6-8 
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IMP (implementation dependent), 1-9 


IMPLVER (Implementation version) instruction, 
4—142 


INE bit. See Arithmetic traps, inexact result 
INED bit. See Trap disable bits, inexact result trap 
Infinity, 4-64 
conversion to integer, 4—88 
INSBL instruction, 4—55 
Insert byte instructions, 4-55 
INSLH instruction, 4—55 
INSLL instruction, 4—55 
INSQH instruction, 4-55 
INSQL instruction, 4—55 
Instruction fetches (memory), 5-11 


Instruction formats 
branch, 3-12 
conventions, 3-10 
floating-point convert, 3-14 
floating-point operate, 3-13 
floating-point to integer move, 3-14 
memory, 3-11 
memory jump, 3-12 
operand values, 3-10 
operators, 3-6 
overview, 1-4 
PALcode, 3-14 
registers, 3-1 

Instruction set 
access type field, 3-5 
Boolean, 441 
branch, 4-18 
byte manipulate, 4-47 
conditional move (integer), 4-43 
data type field, 3-6 
floating-point subsetting, 4-2 
integer arithmetic, 4~-24 
introduced, 1-6 
jump, 4-18 
load memory integer, 4-4 
miscellaneous, 4-133 
multimedia, 4-154 
name field, 3-5 
opcode qualifiers, 4-3 
operand notation, 3-5 
overview, 4-1 
shift, arithmetic, 4-46 
software emulation rules, 4—3 
store memory integer, 4—4 
VAX compatibility, 4-152 
See also Floating-point instructions 


Instructions, overview, 1—4 
INSWH instruction, 4—55 
INSWL instruction, 4—55 


Integer registers 
defined, 3-1 
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R31 restrictions, 3-1 
INV bit. See Arithmetic traps, invalid operation 
INVD bit. See Trap disable bits, invalid operation 
IOV bit. See Arithmetic traps, integer overflow 


I-stream 


coherency of, 6—8 
modifying physical, 5-5 
modifying virtual, 5—5 
PALcode with, 6-2 
with caches, 5-5 


ITOFF instruction, 4—125 
ITOFS instruction, 4-125 
ITOFT instruction, 4-125 


J 


JMP instruction, 4—22 
JSR instruction, 4-22 
JSR_COROUTINE instruction, 4—22 


Jump instructions, 4-18, 4-22 
branch prediction logic, 4-22 
coroutine linkage, 4-23 
return from subroutine, 4—22 
unconditional long jump, 4—23 
See also Control instructions 





L 


LDA instruction, 4—5 
LDAH instruction, 4—5 
LDBU instruction, 4-6 
LDF instruction, 4—91 
LDG instruction, 4—92 
LDL instruction, 4-6 
LDL_L instruction, 4—9 
restrictions, 4—10 


with processor lock register/flag, 4—10 
with STx_C instruction, 4—9 


LDQ instruction, 4—6 
LDQ_L instruction, 4-9 
restrictions, 4-10 


with processor lock register/flag, 4—10 
with STx_C instruction, 4—10 


LDQ_U instruction, 4-8 


LDS instruction, 4—93 
with FPCR, 4-84 
LDT instruction, 4-94 


LDWU instruction, 4-6 
LEFT_SHIFT(x,y) operator, 3-8 
lg operator, 3-8 
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Literals, operand notation, 3—5 
Litmus tests, shared data veracity, 5-17 


Load instructions 


emulation of, 4—3 

FETCH instruction, 4—14 

Load address, 4—5 

Load address high, 4—5 

load byte, 4-6 

load longword, 4-6 

load quadword, 4-6 

load quadword locked, 4—10 

load sign-extended longword locked, 4—9 
load unaligned quadword, 4-8 

load word, 4—6 

multiprocessor environment, 5-6 
serialization, 4-143 

See also Floating-point load instructions 


Load memory integer instructions, 4-4 
LOAD_LOCKED operator, 3-8 
Load-locked, defined, 5—16 

Location, 5-11 

Location access constraints, 5-14 


Lock flag, per-processor 


defined, 3-2 
when cleared, 4-10 
with load locked instructions, 4—10 


Lock registers, per-processor 


defined, 3-2 
with load locked instructions, 4-10 


Lock variables, with WMB instruction, 4-150 
Logical instructions. See Boolean instructions 


Longword data type, 2-2 


alignment of, 2—12 
atomic access of, 5—2 


LSB (least significant bit), defined for floating-point, 
4-64 


M 


/M opcode qualifier, IEEE floating-point, 4—67 
MAP F function, 2—4 

MAP _S function, 2-7 

MAP_x operator, 3-8 

Mask byte instructions, 4—57 

MAX, defined for floating-point, 4—65 
MAXS(x,y) operator, 3-8 

MAXSB8 instruction, 4—155 
MAXSW4 instruction, 4-155 
MAXU(x,y) operator, 3-8 

MAXUB8 instruction, 4—155 
MAXUW4 instruction, 4-155 
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MB (Memory barrier) instruction, 4-143 


compared with WMB, 4—150 
multiprocessors only, 4-143 

with DMA I/O, 5-22 

with LDx_L/STx_C, 4-14 

with multiprocessor D-stream, 5—22 
with shared data structures, 5-9 

See also IMB, WMB 


MBZ (must be zero), 1-9 


Memory access 


coherency of, 5-1 

granularity of, 5-2 

width of, 5-3 

with WMB instruction, 4-149 


Memory alignment, requirement for, 5—2 


Memory barrier instructions. See MB, IMB 
(PALcode), and WMB instructions 


Memory barriers, 5—22 
Memory instruction format, 3-11 
Memory jump instruction format, 3-12 
Memory management 

support in PALcode, 6-2 
Memory prefetch registers, defined, 3-3 
Memory-like behavior, 5-3 
ME_FPCR instruction, 4-109 
MIN, defined for floating-point, 4-65 
MINS(x,y) operator, 3-8 
MINSB8 instruction, 4-155 
MINSW4 instruction, 4-155 
MINU(x,y) operator, 3-8 
MINUB8 instruction, 4—155 
MINUWS instruction, 4-155 
Miscellaneous instructions, 4-133 


Move instructions (conditional). See Conditional 
move instructions 


MSKBL instruction, 4-57 
MSKLH instruction, 4-57 
MSKLL instruction, 4—57 
MSKQL instruction, 4—57 
MSKWH instruction, 4—57 
MSKWL instruction, 4—57 
MT_FPCR instruction, 4-109 
synchronization requirement, 4-82 
MULF instruction, 4—127 
MULG instruction, 4-127 


MULL instruction, 4-34 
with MULQ, 4-34 
MULAQ instruction, 4-35 
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with MULL, 4-34 
with UMULH, 4-35 


MULS instruction, 4-128 
MULT instruction, 4-128 
Multimedia instructions, 4-154 


Multiply instructions 
multiply longword, 4-34 
multiply quadword, 4—35 
multiply unsigned quadward high, 4-36 
See also Floating-point operate 
Multiprocessor environment 
cache coherency in, 5-6 
context switching, 5-24 
[-stream reliability, 5—23 
MB and WMB with, 5-22 
no implied barriers, 5-22 
read/write ordering, 5-10 
serialization requirements in, 4-143 
shared data, 5-6 


N 


NaN (Not-a-Number) 
conversion to integer, 4-88 
copying, generating, propograting, 4-89 
defined, 2-6 
quiet, 4-64 
signaling, 4-64 
NATURALLY ALIGNED data objects, 1-8 


Non-finite number, 4—64 





Nonmemory-like behavior, 5-3 
NOT instruction, ORNOT with zero, 4-42 
NOT operator, 3-9 


O 


Opcode qualifiers 
default values, 4—3 
notation, 4-3 
See also specific qualifiers 


Operand expressions, 3-4 





Operand notation, defined, 3-4 
Operand values, 3-4 
Operate instruction format 
unused function codes with, 3-13 
Operate instructions, convert with integer overflow, 
4-78 
Operators, instruction format, 3-6 
OR operator, 3-9 
ORNOT instruction, 4-42 


Overlap 


with location access constraints, 5—14 
with processor issue constraints, 5-13 
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with visibility, 5-14 
OVF bit. See Arithmetic traps, overflow 
OVFD bit. See Trap disable bits, overflow disable 


p 


Pack to bytes instructions, 4-158 


PALcode 
barriers with, 5—22 
CALL_PAL instruction, 4-136 
compared to hardware instructions, 6—1 
implementation-specific, 6-2 
instead of microcode, 6-1 
instruction format, 3-14 
overview, 6—1 
recognized instructions, 6—4 
replacing, 6-3 
required, 6-3 
required instructions, 6—5 
running environment, 6—2 
special functions function support, 6-3 


PALcode instructions, required privileged, 6-5 





PALcode instructions, required unprivileged, 6—5 
PCC_CNT, 3-3, 4-144 
PCC_OFF, 3-3, 4-144 
Performance tuning 
IMPLVER instruction with, 4-142 
PERR (Pixel error) instruction, 4—157 
Physical address space, described, 5-1 
PHYSICAL_ADDRESS operator, 3-9 


Pipelined implementations, using EXCB instruction | 


with, 4-139 
Pixel error instruction, 4—157 
PKLB (Pack longwords to bytes) instruction, 4-158 
PKWB (Pack words to bytes) instruction, 4-158 
Prefetch data (FETCH instruction), 4-140 
PRIORITY_ENCODE operator, 3-9 
Privileged Architecture Library. See PALcode 
Processor communication, 5-15 


Processor cycle counter (PCC) register, 3-3 
RPCC instruction with, 4-144 
Processor issue constraints, 5-12 


Processor issue sequence, 5—12 


Program counter (PC) register, 3-1 
with EXCB instruction, 4—139 


Q 


Quadword data type, 2-2 
alignment of, 2—3, 2-12 
atomic access of, 5-2 
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integer floating-point format, 2-12 
T_floating with, 2-12 


R 


R31, restrictions, 3-1 

RAZ (read as zero), 1-9 

RC (read and clear) instruction, 4-153 

RDUNIQUE (PALcode) instruction 
required recognition of, 6—4 


Read/write ordering (multiprocessor), 5-10 


determining requirements, 5—10 
hardware implications for, 5-29 
memory location defined, 5—11 


Regions in physical address space, 5-1 





Registers, 3-1 
floating-point, 3-2 
integer, 3-1 
lock, 3-2 
memory prefetch, 3-3 
optional, 3-3 
processor cycle counter, 3-3 
program counter (PC), 3-1 
value when unused, 3-10 
VAX compatibility, 3-3 
See also specific registers 
Relational Operators, 3-9 


Representative result, 4—64 

RET instruction, 4—22 

RIGHT_SHIFT(x,y) operator, 3-9 

Rounding modes. See Floating-point rounding modes 


RPCC (read processor cycle counter) instruction, 
4-144 


RS (read and set) instruction, 4—153 


S 


S_floating data type 
alignment of, 2-8 
compared to F_floating, 2-8 
exceptions, 2-8 
mapping, 2-7 
MAX/MIN, 4-65 
NaN with T_floating convert, 4-88 
operations, 4-62 
S4ADDL instruction, 4-26 
S4ADDQ instruction, 4—28 
S4SUBL instruction, 4-38 
S4SUBQ instruction, 4-40 
S8ADDL instruction, 4-26 
S8ADDQ instruction, 4-28 


S8SUBL instruction, 4-38 
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S8SUBQ instruction, 440 
SBZ (should be zero), 1-9 
Security holes, 1-7 

with UNPREDICTABLE results, 1-8 
Serialization, MB instruction with, 4-143 
SEXT(x) operator, 3-9 
Shared data (multiprocessor) 

changed vs. updated datum, 5-6 


Shared data structures 


atomic update, 5-7 
ordering considerations, 5—9 
using memory barrier (MB) instruction, 5-9 


Shared memory 


accessing, 5-11 
defined, 5-10 


Shift arithmetic instructions, 4—46 
Sign extend instructions, 4-60 
Single-precision floating-point, 4-62 
SLL instruction, 4-45 
SQRTF instruction, 4-129 
SQRTG instruction, 4-129 
SQRTS instruction, 4—130 
SQRTT instruction, 4-130 
Square root instructions 

TEEE, 4-130 

VAX, 4-129 
SRA instruction, 4-46 
SRL instruction, 4—45 
STB instruction, 4—15 
STF instruction, 4-95 
STG instruction, 4-96 
STL instruction, 4-15 


STL_C instruction, 4-12 
when guaranteed ordering with LDL_L, 4-14 
with LDx_L instruction, 4—12 
with processor lock register/flag, 4-12 


Storage, defined, 5-14 


Store instructions 
emulation of, 4-3 
FETCH instruction, 4—140 
multiprocessor environment, 5-6 
serialization, 4-143 
Store byte, 4-15 
store longword, 4—15 
store longword conditional, 4-12 
store quadword, 4—15 
store quadword conditional, 4—12 
Store word, 4—15 
STQ_U, 4-17 
See also Floating-point store instructions 


Store memory integer instructions, 4—4 
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STORE_CONDITIONAL operator, 3-9 
Store-conditional, defined, 5—16 
STQ instruction, 4-15 


STQ_C instruction, 4-12 


when guaranteed ordering with LDQ_L, 4-14 
with LDx_L instruction, 4-12 
with processor lock register/flag, 4-12 


STQ_U instruction, 4-17 


STS instruction, 4-97 
with FPCR, 4-84 
STT instruction, 4-98 


STW instruction, 4-15 
SUBF instruction, 4-131 
SUBG instruction, 4-131 
SUBL instruction, 4-37 
SUBQ instruction, 4-39 
SUBS instruction, 4-132 
SUBT instruction, 4-132 


Subtract instructions 
subtract longword, 4-37 
subtract quadword, 4-39 
subtract scaled longword, 4-38 
subtract scaled quadword, 4—40 
See also Floating-point operate 


SUM bit. See Summary bit 
Summary bit, in FPCR, 4-80 


SWPPAL (PALcode) instruction 
required recognition of, 6-4 

swppal (PALcode) instruction 
required recognition of, 6-4 


T 


T_floating data type 
alignment of, 2-9 
exceptions, 2-9 
format, 2-9 
MAX/MIN, 4-65 
NaN with S_floating convert, 4—88 


TEST(x,cond) operator, 3-10 
Timeliness of location access, 5—17 


Trap disable bits, 4-78 
denormal operand exception, 4—81 
division by zero, 4-81 
DZED with DZE arithmetic trap, 4-77 
DZED with INV arithmetic trap, 4-76 
inexact result, 4—80 
invalid operation, 4—81 
overflow disable, 4-81 
underflow, 4-80 
underflow to zero, 4-80 
when unimplemented, 4—79 


Trap handler, with non-finite arithmetic operands, 
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4-74. 
Trap modes, floating-point, 4-69 
Trap shadow 
defined for floating-point, 4-64 
programming implications for, 5-30 
TRAPB (trap barrier) instruction 


described, 4—146 
with FPCR, 4-84 


True result, 4-64 
True zero, 4—65 


U 


UMULH instruction, 4-36 
with MULQ, 4-35 
UNALIGNED data objects, 1-8 


Unconditional long jump, 4—23 

UNDEFINED operations, 1-7 

UNDZ bit. See Trap disable bits, underflow to zero 
UNF bit. See Arithmetic traps, underflow 

UNED bit. See Trap disable bits, underflow 
UNORDERED memory references, 5—10 

Unpack to bytes instructions, 4-159 


UNPKBL (Unpack bytes to longwords) instruction, 
4-159 


UNPKBW (Unpack bytes to words) instruction, 
4-159 


UNPREDICTABLE results, 1-7 
Updated datum, 5-6 


V 


VAX compatibility instructions, restrictions for, 
4-152 
VAX compatibility register, 3-3 
VAX floating-point 
D_floating, 2-5 
F_floating, 2-3 
G_floating, 2-4 
See also Floating-point instructions 
VAX floating-point instructions 


add instructions, 4-110 

compare instructionsCMPGEQ instruction, 

convert from integer instructions, 4-115 

convert to integer instructions, 4-114 

convert VAX floating format instructions, 
4-116 

divide instructions, 4—121 

from integer move, 4—125 

function field format, 4-87 

multiply instructions, 4-127 
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operate instructions, 4—102 
square root instructions, 4-129 
subtract instructions, 4-131 


VAX rounding modes, 4—66 


Vector instructions 


byte and word maximum, 4-155 
byte and word minimum, 4—155 


Virtual D-cache, 5-4 
Virtual I-cache, 5-4 

maintaining coherency of, 5-5 
Visibility, defined, 5-14 


W 


WH64 (Write hint) instruction, 4—147 
WH64 instruction, lock_flag with, 4-10 


WMB (Write memory barrier) instruction, 4—149 


atomic operations with, 5—8 
compared with MB, 4-150 
with shared data structures, 5—9 


Word data type, 2-1 
atomic access of, 5-3 
Write buffers, requirements for, 5-5 





Write-back caches, requirements for, 5-5 


wrunique (PALcode) instruction 
required recognition of, 6—4 


X 


x MOD y operator, 3-8 


X_floating datatype, 2-9 
alignment of, 2-10 


big-endian format, 2-10 
MAX/MIN, 4—65 


XOR instruction, 4-42 
XOR operator, 3-10 


Y 


YUV coordinates, interleaved, 4—154 


Z 


ZAP instruction, 4-61 
ZAPNOT instruction, 4-61 
Zero byte instructions, 4-61 
ZEXT(x)operator, 3-10 
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OpenVMS Alpha Software (II-A) 


This section describes how the OpenVMS Alpha operating system relates to the Alpha an 
tecture and contains the following chapters: 


e Chapter 1, Introduction to OpenVMS Alpha (II-A) 

e Chapter 2, PALcode Instruction Descriptions (II-A) 

e Chapter 3, Memory Management (II-A) 

e Chapter 4, Process Structure (II-A) 

e Chapter 5, Internal Processor Registers (II—-A) 

e¢ Chapter 6, Exceptions, Interrupts, and Machine Checks (II-A) 
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| Chapter 1 


Introduction to OpenVMS Alpha (II—A) 


The goals of this design are to provide a hardware-implementation independent interface 
between the OpenVMS Alpha operating system and the hardware. Further, the design pro- 
vides the needed abstractions to minimize the impact between OpenVMS Alpha and different 
hardware implementations. Finally, the design must contain only that overhead necessary to 
satisfy those requirements, while still supporting high-performance systems. 


1.1 Register Usage 


In addition to those registers described in Part I, Common Architecture, OpenVMS Alpha 
defines the registers described in the following sections. 


1.1.1 Processor Status 


The Processor Status (PS) is a special register that contains the current status of the processor. 
‘It can be read by the CALL_PAL RD_PS instruction. The software field PS<SW> can be writ- 
ten by the CALL_PAL WR_PS_SW routine. See Chapter 6 for a description of the PS register. 


1.1.2 Stack Pointer (SP) 


Integer register R30 is the Stack Pointer (SP). 
The SP contains the address of the top of the stack in the current mode. 


Certain PALcode instructions, such as CALL_PAL REI, use R30 as an implicit operand. Dur- 
ing such operations, the address value in R30, interpreted as an unsigned 64-bit integer, 
decreases (predecrements) when items are pushed onto the stack and increases (postincre- 
ments) when they are popped from the stack. After pushing (writing) an item to the stack, SP 
points to that item. 


1.1.3 Internal Processor Registers (IPRs) 


The IPRs provide an architected mapping to internal hardware or provide other specialized 
uses. They are available only to privileged software through PALcode routines and allow 
OpenVMS Alpha to interrogate or modify system state. The IPRs are described in Chapter 5. 
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_ 1.1.4 Processor Cycle Counter (PCC) 


The PCC register consists of two 32-bit fields. The low-order 32 bits (PCC<31:0>) are an 
unsigned, wrapping counter, PCC_CNT. The high-order 32 bits (PCC<63:32>) are an offset, 
PCC_OFF. PCC_OFF is a value that, when added to PCC_CNT, gives the total PCC register 
count for this process, modulo 2**32. 
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1.2 \Revision History 


Revision 7.0, November 13, 1997 
1. OpenVMS AXP —> OpenVMS Alpha 


Revision 6.0, December 12, 1994 
1. Added eco 69, PCC register definition 
2. Alpha —> Alpha AXP 


Revision 1.0, May 12, 1992 
1. Created for SRM Version 5 


2. First review distribution\ 
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Chapter 2 


PAL code Instruction Descriptions (II-A) 


This chapter describes the PALcode instructions that are implemented for the OpenVMS 
Alpha environment. The PALcode instructions are a set of unprivileged and privileged 
CALL_PAL instructions that are used to match specific operating system requirements to the 
underlying hardware implementation. 


For example, privileged PALcode instructions switch the hardware context of a process struc- 
ture. Unprivileged PALcode instructions implement the uninterruptible queue operations. 
Also, PALcode instructions provide mechanisms for standard interrupt and exception report- 
ing that are independent of the underlying hardware implementation. 


Table 2-1 lists all the unprivileged and privileged OpenVMS Alpha PALcode instructions and 
the section in this chapter in which they are described. 


Table 2-1: OpenVMS Alpha PALcode Instructions 


Mnemonic 


AMOVRM 
AMOVRR 
BPT 
BUGCHK 
CFLUSH 
CHME 
CHMK 
CHMS 
CHMU 
CLRFEN 
CSERVE 
DRAINA 
GENTRAP 
HALT 
IMB 


Operation 


Atomic move register/memory 


Atomic move register/register 


Breakpoint 

Bugcheck 

Cache flush 

Change mode to executive 


Change mode to kernel 


‘Change mode to supervisor 


Change mode to user 
Clear floating-point enable 
Console service 

Drain aborts 

Generate software trap 
Halt processor 


I-stream memory barrier 
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Section 


2.4.1 
2.4.1 
21h 
221.2 
2.6.1 
2,133 
2.1.4 
2A 
2.1.6 
21a) 
2.6.2 
Common Architecture, Chap. 6 
2.1.8 
Common Architecture, Chap. 6 


Common Architecture, Chap. 6 
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Table 2-1: OpenVMS Alpha PAL code Instructions (Continued) 


Mnemonic 


INSQxxx 
LDQP 
MFPR 
MTPR 
PROBER 
PROBEW 
RD_PS 
READ_UNQ 
REI 
REMQxxx 
RSCC 
_STQP 
SWASTEN 
SWPCTX 
SWPPAL 
WRITE_UNQ 
WR_PS_SW 
WTINT 


Operation 


Insert in specified queue 

Load quadword physical 

Move from processor register 
Move to processor register 

Probe read access 

Probe write access 

Read processor status 

Read unique context 

Return from exception or interrupt 
Remove from specified queue 
Read system cycle counter 

Store quadword physical 

Swap AST enable 

Swap privileged context 

Swap PALcode image 

Write unique context 

Write processor status software field 


Wait for interrupt 


DIGITAL Restricted Distribution 
2-2 OpenVMS Alpha Software (II-A) 


Section 


2.3 
2.63 
2.6.4 
2.6.5 
2.1.9 
2.1.9 
2.1.10 
2.5.1 
2.1.11 
23 
2.1.12 
2.6.6 
2.1.13 
2.6.7 
2.6.8 
2.5.2 
2.1.14 
2.6.9 


2.1 Unprivileged General PALcode Instructions 


_ The general unprivileged instructions in this section, together with those in Sections 2.3, 2.4, 
and 2.5, provide support for the underlying OpenVMS Alpha model. 


Table 2-2: Unprivileged General PALcode Instruction Summary 


Mnemonic Operation 

BPT Breakpoint 

BUGCHK Bugcheck 

CHME Change mode to executive 

CHMK Change mode to kernel 

CHMS Change mode to supervisor 

CHMU Change mode to user 

CLRFEN Clear floating-point enable 
GENTRAP Generate software trap 

IMB I-stream memory barrier. See Common Architecture, Chapter 6. 
PROBER Probe read access 

PROBEW Probe write access 

RD_PS Read processor status 

REI Return from exception or interrupt 
RSCC Read system cycle counter 
SWASTEN Swap AST enable 

WR_PS_SW __ Write processor status software field 
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2.1.1 Breakpoint 


Format: 
CALL PAL BPT ! PALcode format 


Operation: 
{initiate BPT exception with new_mode=kernel} 
Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL_PAL BPT Breakpoint 


Description: 


The BPT instruction is provided for program debugging. It switches to kernel mode and 
pushes R2..R7, the updated PC, and PS on the kernel stack. It then dispatches to the address in 
the Breakpoint SCB vector. See Section 6.3.3.2.1. 
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2.1.2 Bugcheck 


Format: 
CALL _ PAL BUGCHK ! PALcode format 


Operation: 


{initiate BUGCHK exception with new_mode=kerne1} 
! R16 contains a value encoding for the bugchk trap 


Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL_PAL BUGCHK Bugcheck 


Description: 


The BUGCHK instruction is provided for error reporting. It switches to kernel mode and 
pushes R2..R7, the updated PC, and PS on the kernel stack. It then dispatches to the address in 
the Bugcheck SCB vector. See Section 6.3.3.2.2. 


The value in R16 identifies the particular bugcheck type. Interpretation of the encoded value 
determines the course of action by the operating system. 
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2.1.3 Change Mode Executive 


Format: 
CALL_PAL CHME ! PALcode format 


Operation: 
tmpl < MINU( 1, PS<CM>) 
{initiate CHME exception with new_mode=tmp1} 


Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL _ PAL CHME Change Mode to Executive 


Description: 


The CHME instruction lets a process change its mode in a controlled manner. 


A change in mode also results in a change of stack pointers: the old pointer is saved, the new 
pointer is loaded. R2..R7, PC, and PS are pushed onto the selected stack. The saved PC 
addresses the instruction following the CHME instruction. Registers R22, R23, R24, and R27 
are available for use by PALcode as scratch registers. The contents of these registers are not 
preserved across a CHME. 
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2.1.4 Change Mode to Kernel 


Format: 
CALL_PAL CHMK ! PALcode format 


Operation: 


{initiate CHMK exception with new_mode=kernel} 


Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL_PAL CHMK Change Mode to Kernel 


Description: 


The CHMK instruction lets a process change its mode to kernel in a controlled manner. 


A change in mode also results in a change of stack pointers: the old pointer is saved, the new 
pointer is loaded. R2..R7, PC, and PS are pushed onto the kernel stack. The saved PC 
addresses the instruction following the CHMK instruction. Registers R22, R23, R24, and R27 
are available for use by PALcode as scratch registers. The contents of these registers are not 
preserved across a CHMK. 
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2.1.5 Change Mode Supervisor 


Format: 


CALL_PAL CHMS ! PALcode format 


Operation: 

tmp1 < MINU( 2, PS<CM>) 

{initiate CHMS exception with new_mode=tmp1} 
Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL_PAL CHMS Change Mode to Supervisor 


Description: 


The CHMS instruction lets a process change its mode in a controlled manner. 


A change in mode also results in a change of stack pointers: the old pointer is saved, the new 
pointer is loaded. R2..R7, PC, and PS are pushed onto the selected stack. The saved PC 
addresses the instruction following the CHMS instruction. 
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2.1.6 Change Mode User 


Format: 
CALL_PAL CHMU ! PALcode format 


Operation: 


{initiate CHMU exception with new_mode=PS<CM>} 
Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL_PAL CHMU Change Mode to User 


Description: 


The CHMU instruction lets a process call a routine by using the change mode mechanism. 


R2..R7, PC, and PS are pushed onto the current stack. The saved PC addresses the instruction 
following the CHMU instruction. 


The CALL_PAL CHMU instruction is provided for VAX compatibility only. 
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2.1.7 Clear Floating-Point Enable 


Format: 
CALL_PAL CLRFEN ! PALcode format 


Operation: 


FEN < 0 
(HWPCB+56)<0> < 0 ! Update HWPCB on Write 


Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL CLRFEN Clear floating-point enable 


Description: 


The CLRFEN instruction writes a zero to the floating-point enable register and to the HWPCB 
at offset (HWPCB+56)<0>. 
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2.1.8 Generate Software Trap 


Format: 
CALL_PAL GENTRAP ! PALcode format 


Operation: 


{initiate GENTRAP exception with new_mode=kernel} 
! R16 contains the value encoding of the software trap 


Exceptions: 


Kernel Stack Not Valid Halt 


Instruction mnemonics: 


CALL_PAL GENTRAP Generate Software Trap 


Description: 


The GENTRAP instruction is provided for reporting run-time software conditions. It switches 
to kernel mode and pushes R2...R7, the updated PC, and PS on the kernel stack. It then dis- 
patches to the address in the GENTRAP SCB Vector. See Section 6.6. 


The value in R16 identifies the particular software condition that has occurred. The encoding 
for the software trap values is given in the software calling standard for the system. 
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2.1.9 Probe Memory Access 


Format: 
CALL_PAL | PROBE ! PALcode format 


Operation: 


! R16 contains the base address 

! R17 contains the signed offset 

! R18 contains the access mode 

! RO receives the completion status 
! « 1 if success 

! < 0 if failure 


first < R16 . 
last < {R16+R17} 


IF R18<1:0> GTU PS<CM> THEN 
probe mode < R18<1:0> 
ELSE 
probe mode < PS<CM> 


IF ACCESS(first, probe mode) AND ACCESS(last, probe _mode) THEN 
ROe 1 
ELSE 
RO < 0 


Exceptions: 


Translation Not Valid 


Instruction mnemonics: 


CALL _PAL PROBER Probe for Read Access 
CALL_PAL PROBEW Probe for Write Access 
Description: 


The PROBE instruction checks the read or write accessibility of the first and last byte speci- 
fied by the base address and the signed offset; the bytes in between are not checked. 


System software must check all pages between the two bytes if they are to be accessed. If both 
bytes are accessible, PROBE returns the value 1 in RO; otherwise, PROBE returns 0. The 
Fault on Read and Fault on Write PTE bits are not checked. A Translation Not Valid excep- 
tion is signaied only if the mapping structures cannot be accessed. A Translation Not Valid 
exception is signaled only if a higher-level PTE (above Level 3) is invalid. 
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The protection is checked against the less privileged of the modes specified by R18<1:0> and 
the Current Mode (PS<CM>). See Section 6.2 for access mode encodings. 


PROBE is only intended to check a single datum for accessibility. It does not check all inter- 
vening pages because this could result in excessive interrupt latency. 
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2.1.10 Read Processor Status 


Format: 
CALL_PAL RD_PS ! PALcode format 


Operation: 
RO «< PS 
Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL RD_PS Read Processor Status 


Description: 


The RD_PS instruction returns the Processor Status (PS) in register RO. The Processor Status 
is described in Section 6.2. The PS<SP_ALIGN> field is always a zero on a RD_PS. 
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2.1.11 Return from Exception or Interrupt 


Format: 
CALL PAL REI ! PALcode format 


Operation: 


! See Chapter 6 
! for information on interrupted registers 


IF SP<5:0> NE 0 THEN 
{illegal operand } 


tmpl < (SP) ! Get saved R2 
tmp2 < (SP+8) ! Get saved R3 
tmp3 < (SP+16) ! Get saved R4 
tmp4 < (SP+24) ! Get saved R5 
tmp5 < (SP+32) ! Get saved R6 
tmp6 < (SP+40) ! Get saved R7 
tmp7 <— (SP+48) ! Get new PC 

tmp8 < (SP+56) ! Get new PS 


Copy new ps 

Clear cm field 

Clear sp_align field 

Clear Software Field 

Clear except/inter/mcheck flag 


ps_chk < tmp8 
ps_chk<cm> < 0 
ps_chk<sp align> < 0 
ps_chk<sw> < 0 . 
intr flag <« 0 

{ clear lock_flag} 


! If current mode is not kernel check the new ps is valid. 
IF {ps<cm> NE 0} AND 
{{tmp8<cm> LT ps<cm>} OR {ps_chk NE 0}} THEN 
BEGIN 
{illegal operand} 
END 


sp < {sp + 8*8} OR tmp8<sp align> 
IF {internal registers for stack pointers} THEN 
CASE ps<cm> BEGIN 
[0]: ipr_ksp < sp 
[1]: ipr_esp < sp 
[2]: ipr_ssp <« sp 
[3]: ipr_usp < sp 
ENDCASE 
CASE tmp8<cm> BEGIN 
[0]: sp « ipr_ksp 
[1]: sp ¢ ipr_esp 
[2]: sp « ipr_ssp 
[3]: sp « ipr_usp 
ENDCASE 
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ELSE 
(pcbb + 8*ps<cm>) < sp 
sp < (pcbb + 8*tmp8<cm>) 


ENDIF 

R2 < tmpl 

R3 < tmp2 

R4 < tmp3 

R5 < tmp4 

R6 <« tmp5 

R7 < tmpé 

PC < tmp7 

PS < tmp8 <12:00> 


{Initiate interrupts or AST interrupts that are now pending} 


Exceptions: 


Access Violation 

Fault on Read 

Illegal Operand 

Kernel Stack Not Valid Halt 
Translation Not Valid 


Instruction mnemonics: 


CALL_PAL REI Return from Exception or Interrupt 


Description: 


The REI instruction pops the PS, PC, and saved R2...R7 from the current stack and holds them 
in temporary registers. The new PS is checked for validity and consistency. If it is invalid or 
inconsistent, an illegal operand exception occurs; otherwise the operation continues. A kernel 
to nonkernel REI with a new PS<IPL> not equal to zero may yield UNDEFINED results. 


The current stack pointer is then saved and a new stack pointer is selected according to the 
new PS<CM> field. R2 through R7 are restored using the saved values held in the temporary 
registers. A check is made to determine if an AST or other interrupt is pending (see Section 
6.7.6). 


If the enabling conditions are present for an interrupt or AST interrupt at the completion of 
this instruction, the interrupt or AST interrupt occurs before the next instruction. 


When an REI is issued, the current stack must be writeable from the current mode or an 
Access Violation may occur. 


Implementation Note: 


This is necessary so that an implementation can choose to clear the lock_flag by doing a 
STx_C to above the top-of-stack after popping PS, PC, and saved R2..R7 off the current 
stack. 
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2.1.12 Read System Cycle Counter 


Format: 


CALL _ PAL RSCC ! PALcode format 


Operation: 


RO ¢« {System Cycle Counter} 


Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL RSCC Read System Cycle Counter 


Description: 


The RSCC instruction writes register RO with the value of the system cycle counter. This 
counter is an unsigned 64-bit integer that increments at the same rate as the process cycle 
counter. The cycle counter frequency, which is the number of times the system cycle counter 
gets incremented per second rounded to a 64-bit integer, is given in the HWRPB. (See Con- 
sole Interface (III), Chapter 2). 


The system cycle counter is suitable for timing a general range of intervals to within 10% 
error and may be used for detailed performance characterization. It is required on all imple- 
mentations. SCC is required for every processor, and each processor in a multiprocessor 
system has its own private, independent SCC. 


Notes: 


Processor initialization starts the SCC at 0. 


SCC is monotonically increasing. On the same processor, the values returned by two 
successive reads of SCC must either be equal or the value of the second must be greater 
(unsigned) than the first. a 


SCC ticks are never lost so long as the SCC is accessed at least once per each PCC 
overflow period (2**32 PCC increments) during periods when the hardware clock 
interrupt remains blocked. The hardware clock interrupt is blocked whenever the IPL is 
at or above CLOCK_IPL or whenever the processor enters console I/O mode from pro- 
gram I/O mode. 


The 64-bit SCC may be constructed from the 32-bit PCC hardware counter and a 32-bit 
PALcode software counter. As part of the hardware clock interrupt processing, PAL- 
code increments the software counter whenever a PCC wrap is detected. Thus, SCC 
ticks may be lost only when PALcode fails to detect PCC wraps. In a machine where 
the PCC is incremented at a 1 ns rate, this may occur when hardware clock interrupts 
are blocked for greater than 4 seconds. 
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e An implementation-dependent mechanism must exist so that, when enabled, it causes 
the RSCC instruction, as implemented by standard PALcode, always to return a zero in 
RO. This mechanism must be usable by privileged system software. A similar mecha- 
nism must exist for RPCC. Implementations are allowed to have only a single mecha- 
nism, which when enabled causes both RSCC and RPCC to return zero. 
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2.1.13 Swap AST Enable 


Format: 
CALL _PAL SWASTEN ! PALcode format 


Operation: 


RO <- ZEXT(ASTEN<PS<CM>>) 
ASTEN<PS<CM>> < R16<0> 


{check for pending ASTs} 
Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL SWASTEN Swap AST Enable for Current Mode 


Description: 


The SWASTEN instruction swaps the AST enable bit for the current mode. The new state for 
the enable bit is supplied in register R16<0> and previous state of the enable bit is returned, 
zero extended, in RO. 


A check is made to determine if an AST interrupt is pending (see Section 6.7.6.5). 


If the enabling conditions are present for an AST interrupt at the completion of this instruc- 
tion, the AST occurs before the next instruction. 
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2.1.14 Write Processor Status Software Field 


Format: 
CALL_PAL WR_PS_SW ! PALcode format 


Operation: 
PS<SW> < R16<1:0> 
Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL WR_PS_SW Write Processor Status Software Field 


Description: 
The WR_PS_SW instruction writes the Processor Status software field (PS<SW>) with the 
low-order two bits of R16. The Processor Status is described in Section 6.2. 
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2.2 Queue Data Types 


The following sections describe the queue data types that are manipulated by the OpenVMS 
Alpha queue PALcode. Section 2.3 describes the PALcode instructions that perform the 
manipulation. 


2.2.1 Absolute Longword Queues 


A longword queue is a circular, doubly linked list. A longword queue entry is specified by its 
address. Each longword queue entry is linked to the next with a pair of longwords. A queue is 
classified by the type of link it uses. Absolute longword queues use absolute addresses as links. 


The first (lowest addressed) longword is the forward link; it specifies the address of the suc- 
ceeding longword queue entry. The second (highest addressed) longword is the backward link; 
it specifies the address of the preceding longword queue entry. 


A longword queue is specified by a longword queue header, which is identical to a pair of 
longword queue linkage longwords. The forward link of the header is the address of the entry 
termed the head of the longword queue. The backward link of the header is the address of the 
entry termed the tail of the longword queue. The forward link of the tail points to the header. 


An empty longword queue is specified by its header at address H, as shown in Figure 2-1. If 
an entry at address B is inserted into an empty longword queue (at either the head or tail), the 
longword queue shown in Figure 2—2 results. Figures 2—3, 2-4, and 2-5, respectively, illus- 
trate the results of subsequent insertion of an entry at address A at the head, insertion of an | 
entry at address C at the tail, and removal of the entry at address B. 


The queue header and all entries in absolute longword queues need only be byte aligned. For 
better performance, quadword alignment (or higher) is recommended. 


2.2.2 Self-Relative Longword Queues 


Self-relative longword queues use displacements from longword queue entries as links. Long- 
word queue entries are linked by a pair of longwords. The first longword (lowest addressed) is 
the forward link; it is a displacement of the succeeding longword queue entry from the present 
entry. The second longword (highest addressed) is the backward link; it is the displacement of 
the preceding longword queue entry from the present entry. A longword queue is specified by 
a longword queue header, which also consists of two longword links. 


An empty longword queue is specified by its header at address H. Since the longword queue is 
empty, the self-relative links are zero, as shown in Figure 2-6. 


Four types of operations can be performed on self-relative queues: insert at head, insert at tail, 
remove from head, and remove from tail. Furthermore, these operations are interlocked to 
allow cooperating processes in a multiprocessor system to access a shared list without addi- 
tional synchronization. A hardware-supported, interlocked memory-access mechanism is used 
to modify the queue header. Bit <0> of the queue header is used as a secondary interlock and 
is set when the queue is being accessed. 
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If an interlocked queue CALL_PAL instruction encounters the secondary interlock set, then, 
in the absence of exceptions, it terminates after setting RO to —1 to indicate failure to gain 
access to the queue. If the secondary interlock bit is not set, then it is set during the interlocked 
queue operation and is cleared upon completion of the operation. This prevents other inter- 
locked queue CALL_PAL instructions from operating on the same queue. | 


If both the secondary interlock is set and an exception condition occurs, it is UNPREDICT- 
ABLE whether the exception will be reported. 


The queue header and all entries in self-relative longword queues must be at least quadword 
aligned. 


Figures 2—7, 2-8, and 2-9, respectively, illustrate the results of subsequent insertion of an 
entry at address B at the head, insertion of an entry at address A at the tail, and insertion of an 


entry at address C at the tail. 


Figures 2—9, 2-8, and 2-7 (in that order) illustrate the effect of removal at the tail and removal 
at the head. 


Figure 2-1: Empty Absolute Longword Queue 
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Figure 2—4: Absolute Longword Queue with Three Entries 
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Figure 2-5: Absolute Longword Queue with Three Entries After Removing the 
Second Entry 
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Figure 2—6: Empty Self-Relative Longword Queue 
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Figure 2-7: Self-Relative Longword Queue with One Entry 
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Figure 2-8: Self-Relative Longword Queue with Two Entries 
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Figure 2-9: Self-Relative Longword Queue with Three Entries 
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2.2.3 Absolute Quadword Queues 


A quadword queue is a circular, doubly linked list. A quadword queue entry is specified by its 
address. Each quadword queue entry is linked to the next with a pair of quadwords. A queue is 
classified by the type of link it uses. Absolute quadword queues use absolute addresses as 
links. 
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The first (lowest addressed) quadword is the forward link; it specifies the address of the suc- 
ceeding quadword queue entry. The second (highest addressed) quadword is the backward 
link; it specifies the address of the preceding quadword queue entry. 


A quadword queue is specified by a quadword queue header, which is identical to a pair of 
quadword queue linkage quadwords. The forward link of the header is the address of the entry 
termed the head of the quadword queue. The backward link of the header is the address of the 
entry termed the tail of the quadword queue. The forward link of the tail points to the header. 


An empty quadword queue is specified by its header at address H, as shown in Figure 2-10. If 
an entry at address B is inserted into an empty quadword queue (at either the head or tail), the 
quadword queue shown in Figure 2-11 results. Figures 2-12, 2-13, and 2-14, respectively, 
illustrate the results of subsequent insertion of an entry at address A at the head, insertion of 
an entry at address C at the tail, and removal of the entry at address B. 


The queue header and all entries in absolute quadword queues must be at least octaword 
aligned. 


2.2.4 Self-Relative Quadword Queues 


Self-relative quadword queues use displacements from quadword queue entries as links. Quad- 
word queue entries are linked by a pair of quadwords. The first quadword (lowest addressed) 
is the forward link; it is a displacement of the succeeding quadword queue entry from the 
present entry. The second quadword (highest addressed) is the backward link; it is the displace- 
ment of the preceding quadword queue entry from the present entry. A quadword queue is 
specified by a quadword queue header, which also consists of two quadword links. 


An empty quadword queue is specified by its header at address H. Since the quadword queue 
is empty, the self-relative links are zero, as shown in Figure 2-15. 


Four types of operations can be performed on self-relative queues: insert at head, insert at tail, 
remove from head, and remove from tail. Furthermore, these operations are interlocked to 
allow cooperating processes in a multiprocessor system to access a shared list without addi- 
tional synchronization. A hardware-supported, interlocked memory-access mechanism is used 
to modify the queue header. Bit <O0> of the queue header is used as a secondary interlock and 
is set when the queue is being accessed. 


If an interlocked queue CALL_PAL instruction encounters the secondary interlock set, then, 
in the absence of exceptions, it terminates after setting RO to —1 to indicate failure to gain 
access to the queue. Ifthe secondary interlock bit is not set, it is set during the interlocked 
queue operation and is cleared upon completion of the operation. This prevents other inter- 
locked queue CALL_PAL instructions from operating on the same queue. 


If both the secondary interlock is set and an exception condition occurs, it is UNPREDICT- 
ABLE whether the exception will be reported. 


The queue header and all entries in self-relative quadword queues must be at least octaword 
aligned. 
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Figures 2-16, 2-17, and 2-18, respectively, illustrate the results of subsequent insertion of an 
entry at address B at the head, insertion of an entry at address A at the tail, and insertion of an 
entry at address C at the tail. 


Figures 2-18, 2-17, and 2-16, (in that order) illustrate the effect of removal at the tail and 
removal at the head. 


Figure 2-10: Empty Absolute Quadword Queue 
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Figure 2-13: Absolute Quadword Queue with Three Entries 
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Figure 2-14: Absolute Quadword Queue with Three Entries After Removing the 
Second Entry 
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Figure 2-15: Empty Self-Relative Quadword Queue 
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Figure 2-16: Absolute Quadword Queue with One Entry 
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Figure 2-17: Self-Relative Quadword Queue with Two Entries 
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Figure 2-18: Self-Relative Quadword Queue with Three Entries 
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2.3 Unprivileged Queue PAL code Instructions 


The following unprivileged PALcode instructions perform atomic modification of the queue 
data types that are described in Section 2.2. 


Table 2-3: Queue PALcode Instruction Summary 


Mnemonic 


INSQHIL 
INSQHILR 
INSQHIQ 
INSQHIOQR 
INSQTIL 
INSQTILR 
INSQTIQ 
INSQTIQR 
INSQUEL 
INSQUEQ 
REMQHIL 
REMQHILR 
REMQHIQ 
REMQHIOR 
REMOTIL 
REMOTILR 
REMQTIQ 
REMOTIOR 
REMQUEL 
REMQUEQ 


Operation 


Insert into longword queue at head, interlocked 

Insert into longword queue at head, interlocked, resident 
Insert into quadword queue at head, interlocked 

Insert into quadword queue at head, interlocked, resident 
Insert into longword queue at tail, interlocked 

Insert into longword queue at tail, interlocked, resident 
Insert into quadword queue at tail, interlocked 

Insert into quadword queue at tail, interlocked, resident 
Insert into longword queue 

Insert into quadword queue 

Remove from longword queue at head, interlocked 

Remove from longword queue at head, interlocked, resident 
Remove from quadword queue at head, interlocked 
Remove from quadword queue at head, interlocked, resident 
Remove from longword queue at tail, interlocked 

Remove from longword queue at tail, interlocked, resident 
Remove from quadword queue at tail, interlocked 

Remove from quadword queue at tail, interlocked, resident 
Remove from longword queue 


Remove from quadword queue 
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2.3.1 Insert Entry into Longword Queue at Head Interlocked 


Format: 
CALL_PAL INSQHIL ! PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-l1 if the secondary interlock was set 
0 if the queue was not empty before adding this entry 
1 if the queve was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 


! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! check entry and header alignment and 
! that the header and entry not same location and 
! that the header and entry are valid 32 bit addresses 
IF {R16<2:0> NE 0} OR {R17<2:0> NE 0} OR {R16 EQ R17} OR 
{SEXT(R16<31:0>) NE R16} OR {SEXT(R17<31:0>) NE R17} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 < (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock 
RO < -1, {return} ! Already set 
done «STORE CONDITIONAL ((R16) «{tmp0 OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO <¢ -1, {return} ! Retry exceeded 
MB 
tmpl < SEXT(tmp0<31:0>) 
IF {tmp1<2:1> NE 0} THEN BEGIN ! Check alignment 
BEGIN ! Release secondary interlock. 


(R16) < tmp0 
{illegal operand exception} 
END 


! Check if following addresses can be written 

! without causing a memory management exception: 
! entry 

! header + tmpl 
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IF {all memory accesses can NOT be completed} THEN 


BEGIN ! Release secondary interlock. 
(R16) <— tmp0 
{initiate memory management fault} 

END 


! All accesses can be done so enqueue the entry 


tmp2 < SEXT({R16 - R17}<31:0>) 

(R17)<31:0> < tmpl + tmp2 ! Forward link 

(R17 + 4)<31:0> «— tmp2 ! Backward link 

(R16 + tmpl + 4)<31:0> « ~-tmpl —- tmp2! Successor back link 


MB 
(R16)<31:0> < -tmp2 ! Forward link of header 


! Release lock 
IF tmpl EQ 0 THEN 


RO < 1 ! Queue was empty 
ELSE 
RO <¢ 0 ! Queue was not empty 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


Instruction mnemonics: 


CALL_PAL INSQHIL Insert into Longword Queue at Head Interlocked 


Description: 


If the secondary interlock is clear, INSQHIL inserts the entry specified in R17 into the self-rel- 
ative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. Before the insertion, the processor validates that the entire oper- 
ation can be completed. This ensures that if a memory management exception occurs, the 
queue is left in a consistent state (see Chapters 3 and 6). If the instruction fails to acquire the 
secondary interlock after "N" retry attempts, then (in the absence of exceptions) RO is set to 
—1. The value "N" is implementation dependent. \The selected initial value of N is 20.\ 
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2.3.2 Insert Entry into Longword Queue at Head Interlocked Resident 


Format: 
CALL_PAL INSQHILR ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! R17 contains the address of the new entry 

! RO receives status: 

! -1 if the secondary interlock was set 

E. 0 if the queue was not empty before adding this entry 
! 1 if the queue was empty before adding this entry 
; . 
! 

! 

! 

t 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 

All parts of the Queue must be memory resident 


N <-. {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 < (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN 4 Try to set secondary interlock. 
RO < -l, {return} ! Already set 
done <«-STORE CONDITIONAL ((R16) <{tmp0 OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO < -1, {return} ! Retry exceeded 
MB 


tmpl < SEXT(tmp0<31:0>) 

tmp2 < SEXT({R16 - R17}<31:0>) ! Enqueue the entry 
(R17)<31:0> <tmpl + tmp2 Forward link of entry. 
(R17 + 4)<31:0> «tmp2 ! Backward link of entry. 
(R16 + tmpl + 4)<31:0> < -tmpl - tmp2 ! Successor back link 


MB 
(R16)<31:0> < -tmp2 


Forward link of header 
Release the lock 


IF tmpl EQ 0 THEN 


RO <1 ! Queue was empty 
ELSE 

RO < 0 ! Queue was not empty 
END 
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Exceptions: 


Illegal Operand 


Instruction mnemonics: 


CALL_PAL INSQHILR Insert Entry into Longword Queue at Head 
Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQHILR inserts the entry specified in R17 into the 
self-relative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. If the instruction fails to acquire the secondary interlock after 
"N" retry attempts, then (in the absence of exceptions) RO is set to —1. The value "N" is imple- 
mentation dependent. \The selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are quadword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE State and an illegal 
operand fault may be reported. 
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2.3.3 Insert Entry into Quadword Queue at Head Interlocked 


Format: 
CALL_PAL . INSQHIQ ! PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-1 if the secondary interlock was set 
0 if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 
that the header and entry not same location 
IF {R16<3:0> NE 0} OR {R17<3:0> NE 0} OR {R16 EQ R17} THEN 


! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (R16)) ! Acquire hardware interlock. 
IF tmp1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO < -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) «{tmpl OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO ¢« -1, {return} ! Retry exceeded 
MB 
IF {tmp1<3:1> NE 0} THEN BEGIN ! Check Alignment 
BEGIN ! Release secondary interlock 
(R16) < tmpl 
{illegal operand exception} 
END 


! Check if following addresses can be written 

{ without causing a memory management exception: 
! entry 

! header + tmpl 
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IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) © tmpl 
{initiate memory management fault} 
END 


! All accesses can be done so enqueue the entry 

tmp2 < R16 - R17 

(R17) <— tmpl + tmp2 Forward link 

(R17 + 8) ¢ tmpl Backward link 

(R16 + tmpl + 8) « -tmpl - tmp2 ! Successor back link 


MB 


Forward link of header 
Release the lock. 


(R16) << -tmp2 


IF tmpl EQ 0 THEN 


RO 1 ! Queue was empty 
ELSE 
RO < 0 ! Queue was not empty 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


Instruction mnemonics: 


CALL PAL INSQHIQ Insert into Quadword Queue at Head 
Interlocked 


Description: 


If the secondary interlock is clear, INSQHIQ inserts the entry specified in R17 into the self-rel- 
ative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. Before the insertion, the processor validates that the entire oper- 
ation can be completed. This ensures that if a memory management exception occurs, the 
queue is left in a consistent state (see Chapters 3 and 6). If the instruction fails to acquire the 
secondary interlock after "N" retry attempts, then (in the absence of exceptions) RO is set to 
—1. The value "N" is implementation dependent. \The selected initial value of N is 20.\ 
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2.3.4 Insert Entry into Quadword Queue at Head Interlocked Resident | 


Format: 
CALL_PAL INSQHIQR ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! R17 contains the address of the new entry 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the entry was not empty before adding this entry 
! 1 if the entry was empty before adding this entry 
! 

! 

! 

! 

! 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 

All parts of the Queve must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (R16)) ! Acquire hardware interlock. 
IF tmpl<0> EQ 1 THEN ! Try to set secondary interlock. 
RO < -l, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) <-{tmpl OR 1} ) 
Ne N-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO < -1, {return} ! Retry exceeded 
MB 


tmp2 < R16 - R17 Enqueue the entry 
(R17) <— tmpl + tmp2 Forward link of entry. 
(R17 + 8) & tmp2 Backward link of entry. 


(R16 + tmpl + 8) < -tmpl —- tmp2 ! Successor back link 


MB 
(R16) < -tmp2 


Forward link of header, 
Release the lock 


IF tmpl EQ 0 THEN 


RO < 1 ! Queue was empty 
ELSE 

RO < 0 ! Queue was not empty 
END 
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Exceptions: 


Illegal Operand 


Instruction mnemonics: 


CALL_PAL INSQHIQR Insert Entry into Quadword Queue at Head 
Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQHIQR inserts the entry specified in R17 into the 
self-relative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. If the instruction fails to acquire the secondary interlock after 
"N" retry attempts, then (in the absence of exceptions) RO is set to —1. The value "N" is imple- 
mentation dependent. \The selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are octaword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE state and an illegal 
operand fault may be reported. 
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2.3.5 Insert Entry into Longword Queue at Tail Interlocked 


Format: 


CALL_PAL INSQTIL ! PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-l1 if the secondary interlock was set 
0 if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 

that the header and entry not same location and 

that the header and entry are valid 32 bit addresses 

IF {R16<2:0> NE 0} OR {R17<2:0> NE 0} OR {R16 EQ R17} OR 
{SEXT(R16<31:0>) NE R16} OR {SEXT(R17<31:0>) NE R16} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 <- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO ¢ -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) <-{tmp0 OR 1} ) 
NecN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO ¢« -1, {return} ! Retry exceeded 


MB 


tmpl < SEXT(tmp0<31:0>) 
tmp2 < SEXT(tmp0<63:32>) 


IF {tmp1<2:1> NE 0} OR {tmp2<2:0> NE 0} THEN ! Check Alignment 
BEGIN ! Release secondary interlock 
(R16) < tmp0 
{illegal operand exception} 
END 


! Check if following addresses can be written 

! without causing a memory management exception: 
! entry . 

! header + (header + 4) 
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IF {all memory accesses can NOT be completed} THEN 
BEGIN . | Release secondary interlock 
(R16) <— tmp0d 
{initiate memory management fault} 
END 


! All Accesses can be done so enqueue entry 
tmp3 < SEXT( {R16 - R17}<31:0>) 
(R17)<31:0> < tmp3 
(R17 + 4)<31:0> <— tmp2 + tmp3 
IF {tmp2 NE 0} THEN 
(R16+tmp2 )<31:0> <« -tmp3 - tmp2 
ELSE 
tmpl < SEXT({-tmp3 - tmp2}<31:0>) 
(R16+4)<31:0> — -tmp3 
MB 
(R16)<31:0> <« tmpl 
IF tmpl EQ -tmp3 THEN 
RO < 1 
ELSE 
RO < 0 
END 


Forward link 
Backward link 
Forward link of predecessor 


—_— cm 


Backward link of header 


Forward link, release lock 


Queue was empty 


Queue was not empty 


Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


Instruction mnemonics: 


CALL_PAL INSQTIL Insert into Longword Queue at Tail Interlocked 


Description: 


If the secondary interlock is clear, INSQTIL inserts the entry specified in R17 into the self-rel- 
ative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. Before performing any part of the operation, the processor vali- 
dates that the insertion can be completed. This ensures that if a memory management 
exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to -1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 
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2.3.6 Insert Entry into Longword Queue at Tail Interlocked Resident 


Format: 
CALL PAL INSQTILR ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! R17 contains the address of the new entry 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the entry was not empty before adding this entry 
! 1 if the entry was empty before adding this entry 
! 

i 

! 

! 

! 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 

All parts of the Queue must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 < (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO <« -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) < {tmp0 OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO ¢ -1, {return} ! Retry exceeded 
MB 


tmpl < SEXT(tmp0<31:0>) 
tmp2 < SEXT(tmp0<63:32>) 
tmp3 < SEXT( {R16 - R17}<31:0>) 


(R17)<31:0> < tmp3 ! Forward link 

(R17 + 4)<31:0> < tmp2 + tmp3 ! Backward link 

IF {tmp2 NE 0} THEN ! Forward link of predecessor 
(R16+tmp2 )<31:0> « -tmp3 - tmp2 

ELSE 


tmpl < <- SEXT({-tmp3 - tmp2}<31:0>) 


(R16+4)<31:0> <— -tmp3 ! Backward link of header 
MB 
(R16)<31:0> « tmpl. ! Forward link 


! Release the lock 
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IF tmpl EQ -tmp3 THEN 

RO «<1 ! Queue was empty 
ELSE 

RO < 0 
END 


! Queue was not empty 


Exceptions: 


Illegal Operand 


Instruction mnemonics: 


CALL_PAL INSQTILR Insert Entry into Longword Queue at Tail 
Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQTILR inserts the entry specified in R17 into the 
self-relative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. If the instruction fails to acquire the secondary interlock after 
"N" retry attempts, then (in the absence of exceptions) RO is set to —1. The value "N" is imple- 
mentation dependent. \The selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are quadword aligned. No alignment or memory: management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE State and an illegal 
operand fault may be reported. 
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2.3.7 Insert Entry into Quadword Queue at Tail Interlocked 


Format: # 
CALL_PAL INSQTIQ ! PALcode format 
Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-1 if the secondary interlock was set 
0 if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 
that the header and entry not same location 
IF {R16<3:0> NE 0} OR {R17<3:0> NE 0} OR {R16 EQ R17} THEN 


! 
! 
! 
! 
J 
! 
! 
! 
! 
! 
! 
! 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (RI16)) ! Acquire hardware interlock. 
IF tmp1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO <« -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) « {tmpl OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO < -1, {return} ! Retry exceeded 
MB 


tmp2 < (R16+8) 
IF {tmp1<3:1> NE 0} OR {tmp2<3:0> NE 0} THEN ! Check Alignment. 
BEGIN ! Release secondary interlock. 
(R16) < tmpl 
{illegal operand exception} 
END 


! Check if following addresses can be written 

! without causing a memory management exception: 
! entry 

! header + (header + 8) 
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IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock. 
(R16) <— tmpl 
{initiate memory management fault} 
END 
! All accesses can be done so enqueue the entry 
tmp3 <— R16 - R17 


(R17) <— tmp3 ! Forward link 

(R17 + 8) — tmp2 + tmp3 ! Backward link 

IF {tmp2 NE 0} THEN ! Forward link of predecessor 
(R16+tmp2) < -tmp3 - tmp2 

ELSE 
tmpl < {-tmp3 - tmp2} 

(R16+8) < -tmp3 ! Backward link of header 

MB 

(R16) << tmpl ! Forward link 


Release the lock 
IF tmpl EQ -tmp3 THEN 


RO < 1 ! Queue was empty 
ELSE 
RO < 0 ! Queue was not empty 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


Instruction mnemonics: 


CALL_PAL INSQTIQ Insert into Quadword Queue at Tail Interlocked 


Description: 


If the secondary interlock is clear, INSQTIQ inserts the entry specified in R17 into the self-rel- 
ative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. Before performing any part of the operation, the processor vali- 
dates that the insertion can be completed. This ensures that if a memory management 
exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to -1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 
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2.3.8 Insert Entry into Quadword Queue at Tail Interlocked Resident 


Format: 


CALL_PAL INSQTIQR ! PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
~1 if the secondary interlock was set 
0 if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


! Must have write access to header and queue entries 
! Header and entries must be octaword aligned. 


Header cannot be equal to entry. 


! All parts of the Queue must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (RI16)) ! Acquire hardware interlock. 
IF tmp1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO < -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) <{tmpl OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO « -1, {return} ! Retry exceeded 
MB 


tmp2 < (R16+8) 
tmp3 < R16 - R17 


(R17) < tmp3 
(R17 + 8) <— tmp2 + tmp3 
IF {tmp2 NE 0} THEN 


Forward link 
Backward link 
Forward link of predecessor 


(R16+tmp2) < -tmp3 - tmp2 


ELSE 


(R16+8) < -tmp3 


tmpl < {-tmp3 - tmp2} 


Backward link of header 


MB 


(R16) < tmpl 


Forward link and release the lock 


IF tmpl EQ -tmp3 THEN 


RO < 1 ! Queue was empty 
ELSE 

RO < 0 ! Queue was not empty 
END 
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Exceptions: 


Illegal Operand 


Instruction mnemonics: 


CALL_PAL INSQTIQR Insert Entry into Quadword Queue at Tail 
Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQTIQR inserts the entry specified in R17 into the 
self-relative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to 1; otherwise, it is set to 0. The 
insertion is a non-interruptible operation. The insertion is interlocked to prevent concurrent 
interlocked insertions or removals at the head or tail of the same queue by another process, in 
a multiprocessor environment. If the instruction fails to acquire the secondary interlock after 
"N" retry attempts, then (in the absence of exceptions) RO is set to —1. The value "N" is imple- 
mentation dependent. \The selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are octaword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE state and an illegal 
operand fault may be reported. 
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2.3.9 Insert Entry into Longword Queue 


Format: 
CALL_PAL INSQUEL ! PALcode format 


Operation: 
! R16 contains the address of the predecessor entry 
! or the 32 bit address of the 32 bit address of the 
! predecessor entry for INSQUEL/D 
! R17 contains the address of the new entry 
! RO receives status: 
! 0 if the queue was not empty before adding this entry 
! 1 if the queue was empty before adding this entry 
! 


! Header and entries need only be byte aligned 
! Must have write access to header and queue entries 
IF opcode EQ INSQUEL/D THEN 
tmp2 <- SEXT( (R16)<31:0>) ! Address of predecessor 
ELSE 
tmp2 < R16 


IF {all memory accesses can be completed} THEN 


BEGIN 
tmp1<31:0> « SEXT((tmp2)<31:0>) ! Get Forward Link 
(R17)<31:0> < tmpl ! Set forward link 
(R17 + 4)<31:0> « tmp2 ! Backward link 


(SEXT( (tmp2)<31:0>) + 4)<31:0> < R17 
! Backward link of Successor 
(tmp2)<31:0> < R17 ! Forward link of Predecessor 
IF tmpl EQ tmp2 THEN 
RO ¢ 1 
ELSE 
RO < 0 
END 
ELSE 
BEGIN 
{initiate fault} 
END 
END 


Exceptions: 
Access Violation 
Fault on Read 
Fauit on Write 
Translation Not Valid 
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Instruction mnemonics: 


CALL_PAL INSQUEL Insert Entry into Longword Queue 
CALL_PAL INSQUEL/D Insert Entry into Longword Queue Deferred 
Description: 


INSQUEL inserts the entry specified in R17 into the absolute queue following the entry speci- 
fied by the predecessor addressed by R16. INSQUEL/D performs the same operation on the 
entry specified by the contents of the longword addressed by R16. The queue header and entry 
need only be byte aligned. 


In either case, if the entry inserted was the first one in the queue, a 1 is returned in RO; other- 

wise, a O is returned in RO. The insertion is a non-interruptible operation. Before performing 

any part of the insertion, the processor validates that the entire operation can be completed. 

This ensures that if a memory management exception occurs, the queue is left in a consistent 
- state (see Chapters 3 and 6). 
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2.3.10 Insert Entry into Quadword Queue 


Format: 
CALL_PAL INSQUEQ | ! PALcode format 


Operation: 


! R16 contains the address of the predecessor entry 

! or the address of the address of the 

! predecessor entry for INSQUEQ/D 

! R17 contains the address of the new entry 

! RO receives status: 

! 0 if the queue was not empty before adding this entry 
! 1 if the queue was empty before adding this entry 

] : 

! 
! 


Must have write access to header and queue entries 
Header and entries must be octaword aligned 


IF opcode EQ INSQUEQ/D THEN 
IF {R16<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 
tmp2 <- (R16) ! Address of predecessor 
ELSE 
tmp2 < R16 
END 
IF {tmp2<3:0> NE 0} OR {R17<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 
IF {all memory accesses can be completed} THEN 
BEGIN 
tmpl < (tmp2) ! Get forward link of entry 
IF {tmpl<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 
(R17) <—. tmpl 
(R17 + 8) < tmp2 
(tmpl + 8) « R17 
(tmp2) < R17 
IF tmpl EQ tmp2 THEN 
RO< 1 
ELSE 
RO<¢ 0 
END 


Check alignment 


Set forward link of entry 
Backward link of entry 
Backward link of successor 
Forward link of predecessor 
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ELSE 
BEGIN 
{initiate fault} 
END 
END 


Exceptions: 
Access Violation 
Fault on Read 
Fault on Write 
Translation Not Valid 
Illegal Operand 


Instruction mnemonics: 


CALL_PAL INSQUEQ Insert Entry into Quadword Queue 
CALL_PAL INSQUEQ/D Insert Entry into Quadword Queue Deferred 
Description: 


INSQUEQ inserts the entry specified in R17 into the absolute queue following the entry speci- 
fied by the predecessor addressed by R16. INSQUEQ/D performs the same operation on the 
entry specified by the contents of the quadword addressed by R16. 


In either case, if the entry inserted was the first one in the queue, a 1 is returned in RO; other- 
wise, a 0 is returned in RO. The insertion is a non-interruptible operation. Before performing 
_any part of the insertion, the processor validates that the entire operation can be completed. 
This ensures that if a memory management exception occurs, the queue is left in a consistent 
state (see Chapters 3 and 6). RO is UNPREDICTABLE if an exception occurs. The relative 
order of reporting memory management and illegal operand exceptions is UNPREDICTABLE. 
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2.3.11 Remove Entry from Longword Queue at Head Interlocked 


Format: 
CALL_PAL REMQHIL ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 if entry removed and queve still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 
| 

! 

! 

! 

| 

| 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 


Check header alignment and 
that the header is a valid 32 bit address 
IF {R16<2:0> NE 0} OR {SEXT(R16<31:0>) NE R16} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 <- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO ¢ -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) < {tmp0 OR 1} ) 
Ne N-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO ¢ -1, {return} ! Retry exceeded 


MB 


tmpl < SEXT(tmp0<31:0>) 


IF tmp1<2:0> NE 0 THEN ! Check Alignment 
BEGIN ! Release secondary interlock 
(R16) <— tmp0 
{illegal operand exception} 
END 
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! Check if the following can be done without 
! causing a memory management exception: 
! read contents of header + tmpl {if tmpl1 NE 0} 
! write into header + tmpl + (header + tmpl) {if tmpl NE 0} 
IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) <— tmpd 
{initiate memory management fault} 
END 


tmp2 < SEXT({R16 + tmp1}<31:0>) 
IF {tmpl EQL 0} THEN 

tmp3 < R16 
ELSE 

tmp3 < SEXT({tmp2 + SEXT((tmp2)<31:0>)}) 


IF tmp3<2:0> NE 0 THEN ! Check Alignment 


BEGIN ! Release secondary interlock 
(R16) < tmp0 | 
{illegal operand exception} 

END 


(tmp3 + 4)<31:0> <« R16 —- tmp3 Backward link of successor 
MB 


Forward link of header 
Release lock 


(R16)<31:0> <— tmp3 - R16 


IF tmpl EQ 0 THEN 


RO < 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp3 - R16} EQ 0 THEN 
RO < 2 ! Queue now empty 
ELSE 
RO <¢ 1 ! Queue not empty 
END 
END 
R1 < tmp2 ! Address of removed entry 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 
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Instruction mnemonics: 


CALL_PAL REMQHIL Remove from Longword Queue at Head 
Interlocked 


Description: 


If the secondary interlock is clear, REMQHIL removes from the self-relative queue the entry 
following the header, pointed to by R16, and the address of the removed entry is returned in 
Rl. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If the interlock succeeded and the queue was not empty at the start of the 
removal and the queue is empty after the removal, a 2 is returned in RO. If the instruction fails 
to acquire the secondary interlock after "N" retry attempts, then (in the absence of exceptions) 
RO is set to —1. The value "N" is implementation dependent. \The selected initial value of N is 
20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. Before performing any part of the removal, the pro- 
cessor validates that the entire operation can be completed. This ensures that if a memory 
management exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.12 Remove Entry from Longword Queue at Head Interlocked Resident 


Format: 


CALL_PAL REMQHILR ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry © 

! 

! Must have write access to header and queue entries 
{ 
i 


Header and entries must be quadword aligned. 
All parts of the Queue must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT 7 
LOAD LOCKED (tmp0 < (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO <« -1, {return} ! Already set 
done «STORE CONDITIONAL ((R16) < {tmpO OR 1} ) 
Ne N-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO < -1, {return} ! Retry exceeded 
MB 


tmpl < SEXT(tmp0<31:0>) 
tmp2 < SEXT({R16 + tmp1}<31:0>) 
IF {tmpl EQL 0} THEN 


tmp3 < R16 
ELSE 
tmp3 < SEXT({tmp2 + SEXT((tmp2)<31:0>)}) 
END 
(tmp3 + 4)<31:0> < R16 - tmp3 ! Backward link of successor 
MB 
(R16)<31:0> <— tmp3 - R16 ! Forward link of header 


! Release lock 
IF tmpl EQ 0 THEN 
RO < 0 ! Queue was empty 
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ELSE 
BEGIN 
IF {tmp3 - R16} EQ O THEN 
RO < 2 ! Queue now empty 
ELSE 
RO < 1 ! Queue not empty 
END 
END 
R1 ¢ tmp2 ! Address of removed entry 


Exceptions: 
Illegal Operand 


Instruction mnemonics: 


CALL_PAL REMQHILR Remove Entry from Longword Queue at Head 
Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQHILR removes from the self-relative queue the entry 
following the header, pointed to by R16, and the address of the removed entry is returned in 
RI. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If the interlock succeeded and the queue was not empty at the start of the 
removal and the queue is empty after the removal, a 2 is returned in RO. If the instruction fails 
to acquire the secondary interlock after "N" retry attempts, then (in the absence of exceptions) 
RO is set to —1. The value "N" is implementation dependent. \The selected initial value of N is 
20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. | 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are quadword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE state and an illegal 
operand fault may be reported. 
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2.3.13 Remove Entry from Quadword Queue at Head Interlocked 


Format: 
CALL_PAL REMQHIQ ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 

] 
! 
! 
! 
! 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 


Check header alignment 
IF {R16<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 


N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (R16)) ! Acquire hardware interlock. 
IF tmp1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO <¢ -1, {return} ! Already set 
done «STORE CONDITIONAL ((R16) <« {tmpl OR 1} ) 


NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO < -1, {return} ! Retry exceeded 
MB 
IF tmp1<3:0> NE 0 THEN ! Check Alignment 
BEGIN ! Release secondary interlock 


(R16) © tmpl 
{illegal operand exception} 
END 


! Check if the following can be done without 

! causing a memory management exception: 

! read contents of header + tmpl {if tmpl NE 0} 

! write into header + tmpl + (header + tmpl) {if tmpl NE 0} 
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IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
— (R16) < tmp0 
{initiate memory management fault} 
END 


tmp2 <« R16 + tmpl 

IF {tmpl EQL 0} THEN 
tmp3 < R16 

ELSE 
tmp3 <- tmp2 + (tmp2) 


IF tmp3<3:0> NE 0 THEN 
BEGIN 
(R16) < tmpl 
{illegal operand exception} 
END 


Check Alignment 
Release secondary interlock 


(tmp3 + 8) — R16 - tmp3 


Backward link of successor 


MB 


(R16) < tmp3 - R16 Forward link of header 


! Release lock 


IF tmpl EQ 0 THEN 


RO < 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp3 - R16} EQ 0 THEN 
RO < 2 ! Queue now empty 
ELSE 
RO < 1 ! Queue not empty 
END 
END 
Rl < tmp2 ! Address of removed entry 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 
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Instruction mnemonics: 


CALL_PAL REMQHIQ Remove from Quadword Queue at Head 
Interlocked 


Description: 


If the secondary interlock is clear, REMQHIQ removes from the self-relative queue the entry 
following the header, pointed to by R16, and the address of the removed entry is returned in 
Rl. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If there was an entry to remove and the queue is not empty at the end of this 
instruction, RO is set to 1. If the interlock succeeded and the queue was not empty at the start 
of the removal, and the queue is empty after the removal, a 2 is returned in RO. If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to —1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. Before performing any part of the removal, the pro- 
cessor validates that the entire operation can be completed. This ensures that if a memory 
management exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.14 Remove Entry from Quadword Queue at Head Interlocked Resident 


Format: 
CALL _ PAL REMQHIQR ! PALcode format 


Operation: 


R16 contains the address of the queue header 
RO receives status: 
-1 if the secondary interlock was set 
0 if the queue was empty . 
1 if entry removed and queue still not empty 
2 if entry removed and queue empty 
Rl receives the address of the removed entry 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
All parts of the Queue must be memory resident 


N ¢ {retry amount} ! Implementation-specific 
REPEAT | 
LOAD LOCKED (tmpl < (R16)) ! Acquire hardware interlock. 
IF tmpl1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO <« -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) « {tmpl OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO « -1, {return} ! Retry exceeded 
MB 


tmp2 <- R16 + tmpl 
IF {tmpl EQL 0} THEN 


tmp3 < R16 
ELSE 
tmp3 < tmp2 + (tmp2) 
END 
(tmp3 + 8) < R16 —- tmp3 ! Backward link of successor 
MB 
(R16) <— tmp3 —- R16 ! Forward link of header 


! Release lock 
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IF tmpl EQ 0 THEN 


RO < 0 ! Queue was empty 
ELSE 
IF {tmp3 - R16} EQ 0 THEN 
RO ¢< 2 ! Queue now empty 
ELSE 
RO <¢ l ! Queue not empty 
END 
RL <— tmp2 ! Address of removed entry 
Exceptions: 
Illegal Operand 


Instruction mnemonics: 


CALL_PAL REMQHIQR Remove Entry from Quadword Queue at Head 
Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQHIQR removes from the self-relative queue the entry 
following the header, pointed to by R16, and the address of the removed entry is returned in 
R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If there was an entry to remove and the queue is not empty at the end of this 
instruction, RO is set to 1. If the interlock succeeded and the queue was not empty at the start 
of the removal, and the queue is empty after the removal, a 2 is returned in RO. If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to -1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are octaword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE state and an illegal 
operand fault may be reported. 
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2.3.15 Remove Entry from Longword Queue at Tail Interlocked 


Format: 
CALL_PAL REMOQTIL ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 
! 

! 

! 

! 

! 

! 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 


Check header alignment and 
that the header is a valid 32 bit address . 
IF {R16<2:0> NE 0} OR {SEXT(R16<31:0>) NE R16} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 < (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO ¢« -1, {return} ! Already set . 
done «STORE CONDITIONAL ((R16) <{tmp0 OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO ¢ -1, {return} ! Retry exceeded 


MB 


tmpl < SEXT(tmp0<31:0>) 
tmp5 < SEXT(tmp0<63:32>) 


IF tmp5<2:0> NE 0 THEN ! Check alignment 
BEGIN: ! Release secondary interlock 
(R16) < tmp0 
{illegal operand exception} 
END ; 


!Check if the following can be done without 

! causing a memory management exception: 

! read contents of header + (header + 4) {if tmpl NE 0} 
! write into header + (header + 4) 

! + (header + 4 + (header + 4)){if tmpl NE 0} 
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IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) < tmp0 
{initiate memory management fault} 
END 


addr « SEXT( {R16 + tmp5}<31:0> ) 
tmp2 < SEXT( {addr + SEXT( (addr+4)<31:0>)}<31:0> ) 


IF tmp2<2:0> NE 0 THEN ! Check alignment 
BEGIN ! Release secondary interlock 
(R16) < tmp0 
{illegal operand exception} 
END 
(R16 + 4)<31:0> < tmp2 - R16 ! Backward link of header 


IF {tmp2 EQL R16} THEN 
(R16)<31:0> < 0 
ELSE 
BEGIN 
(tmp2)<31:0> < R16 - tmp2 
MB 
(R16)<31:0> < tmpl 
END 
IF tmpl EQ 0 THEN 
RO < 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp2 - R16} EQ O THEN 
RO < 2 
ELSE 
RO < 1 
END 
Rl <¢ addr 


Forward link, release lock 


Forward link of predecessor 


Release lock 


Queue now empty 


Queue not empty 


Address of removed entry 


Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


DIGITAL Restricted Distribution 
PALcode Instruction Descriptions (II-A) 2-61 


Instruction mnemonics: 


CALL_PAL REMQTIL Remove from Longword Queue at Tail 
Interlocked 


Description: 


If the secondary interlock is clear, REMQTIL removes from the self-relative queue the entry 
preceding the header, pointed to by R16, and the address of the removed entry is returned in 
R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If there was an entry to remove and the queue is not empty at the end of this 
instruction, RO is set to 1. If the interlock succeeded and the queue was not empty at the start 
of the removal, and the queue is empty after the removal, a 2 is returned in RO. If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to -1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. Before performing any part of the removal, the pro- 
cessor validates that the entire operation can be completed. This ensures that if a memory 
management exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.16 Remove Entry from Longword Queue at Tail Interlocked Resident 


Format: 
CALL_PAL REMQTILR ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 

{ 

! Must have write access to header and queue entries 
{ 
! 


Header and entries must be quadword aligned. 
All parts of the Queue must be memory resident 


N < {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmp0 <- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO ¢ -l, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) <{tmp0 OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO ¢« -1, {return} ! Retry exceeded 
MB 


tmpl < SEXT(tmp0<31:0>) 
tmp5 < SEXT(tmp0<63:32>) 

addr <« SEXT( {R16 + tmp5}<31:0> ) 

tmp2 < SEXT( {addr + SEXT( (addr+4)<31:0>)}<31:0> ) 


(R16 + 4)<31:0> « tmp2 - R16 ! Backward link of header 
IF {tmp2 EQL R16} THEN 
(R16)<31:0> < 0 ! Forward link, release lock 
ELSE 

BEGIN 

(tmp2)<31:0> < R16 - tmp2 ! Forward link of predecessor 
MB 

(R16)<31:0> < tmpl ! Release lock 

END 
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IF tmpl EQ 0 THEN 


RO < 0 ! Queue was empty 
ELSE 
IF {tmp2 - R16} EQ 0 THEN 
RO < 2 ! Queue now empty 
ELSE 
RO <¢ 1 ! Queue not empty 
END 
END 
Rl « addr ! Address of removed entry 
Exceptions: 
Illegal Operand 


Instruction mnemonics: 


CALL_PAL REMQTILR Remove Entry from Longword Queue at Tail 
Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQTILR removes from the self-relative queue the entry 
preceding the header, pointed to by R16, and the address of the removed entry is returned in 
R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If there was an entry to remove and the queue is not empty at the end of this 
instruction, RO is set to 1. If the interlock succeeded and the queue was not empty at the start 
of the removal, and the queue is empty after the removal, a 2 is returned in RO. If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to—1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are quadword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE state and an illegal 
operand fault may be reported. . 
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2.3.17 Remove Entry from Quadword Queue at Tail Interlocked 


Format: 
CALL_PAL REMQTIQ ! PALcode format 


Operation: 


R16 contains the address of the queue header 
RO receives status: 
-1 if the secondary interlock was set 
0 if the queue was empty 
1 if entry removed and queue still not empty 
2 if entry removed and queue empty 
Rl receives the address of the removed entry 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 


Check header alignment 
IF {R16<3:0> NE 0} THEN 


BEGIN 
{illegal operand exception} 
END 
N ¢-{retry_amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (RI16)) ! Acquire hardware interlock. 
IF tmp1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO <« -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) «{tmpl OR 1} ) 
Ne N-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO <« -1, {return} ! Retry exceeded 
MB 


tmp5 < (R16+8) 


IF tmp5<3:0> NE 0 THEN ! Check Alignment 
BEGIN ! Release secondary interlock 
(R16) < tmpl 
{illegal operand exception} 
END 


! Check if the following can be done without 

! causing a memory management exception: 

! read contents of header + (header + 8) {if tmpl NE 0} 
! write into header + (header + 8) 

! + (header + 8 + (header + 8)){if tmpl NE 0} 
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IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) <— tmpl 
{initiate memory management fault} 
END 


addr <- R16 + tmp5 
tmp2 < addr + (addr + 8) 
IF tmp2<3:0> NE 0 THEN 
BEGIN 
(R16) < tmpl 
{illegal operand exception} 
END 


Check alignment 
Release secondary interlock 


(R16 + 8) < tmp2 - R16 Backward link of header 


IF {tmp2 EQL R16} THEN 


(R16) < 0 ! Forward link, release lock 
ELSE 
BEGIN 
(tmp2) < R16 - tmp2 ! Forward link of predecessor 
MB 
(R16) < tmpl ! Release lock 
END 
END 
IF tmpl EQ 0 THEN 
RO < 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp2 - R16} EQ 0 THEN 
RO < 2 ! Queue now empty 
ELSE 
RO <¢ 1 ! Queue not empty 
END 
END 
Rl < addr ! Address of removed entry 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 
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Instruction mnemonics: 


CALL_PAL REMQTIQ Remove from Quadword Queue at Tail 
Interlocked 


Description: 


If the secondary interlock is clear, REMQTIQ removes from the self-relative queue the entry 
preceding the header, pointed to by R16, and the address of the removed entry is returned in 
R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If there was an entry to remove and the queue is not empty at the end of this 
instruction, RO is set to 1. If the interlock succeeded and the queue was not empty at the start 
of the removal, and the queue is empty after the removal, a 2 is returned in RO. If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to —1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. Before performing any part of the removal, the pro- 
cessor validates that the entire operation can be completed. This ensures that if a memory 
management exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.18 Remove Entry from Quadword Queue at Tail Interlocked Resident 


Format: 
CALL_PAL REMQTIOR ! PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 

{ 

! Must have write access to header and queue entries 
! Header and entries must be octaword aligned. 

! All parts of the Queue must be memory resident 


N < {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpl < (RI16)) ! Acquire hardware interlock. 
IF tmp1<0> EQ 1 THEN ! Try to set secondary interlock. 
RO < -1, {return} ! Already set 
done <-STORE CONDITIONAL ((R16) <{tmpl OR 1} ) 
NeN-1 
UNTIL {done EQ 1} OR {N EQ 0} 
IF done NEQ 1, RO < -l, {return} ! Retry exceeded 
MB 


tmp5 <- (R16+8) 
addr <« R16 + tmp5 
tmp2 < addr + (addr + 8) 


(R16 + 8) <— tmp2 - R16 ! Backward link of header 
IF {tmp2 EQL R16} THEN 
(R16) <« 0 ! Forward link, release lock 
ELSE 
BEGIN ; 
(tmp2) < R16 - tmp2 ! Forward link of predecessor 
MB 
(R16) « tmpl ! Release lock 
END 
END 
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IF tmpl EQ 0 THEN 


RO < 0 ! Queue was empty 
ELSE 
IF {tmp2 - R16} EQ 0 THEN 
RO < 2 ! Queue now empty 
ELSE 
RO < 1 ! Queue not empty 
END 
Rl < addr ! Address of removed entry 
Exceptions: 
Illegal Operand 


Instruction mnemonics: 


CALL_PAL REMQTIQR Remove Entry from Quadword Queue at Tail 
Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQTIQR removes from the self-relative queue the entry 
preceding the header, pointed to by R16, and the address of the removed entry is returned in 
R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, a 0 is 
returned in RO. If there was an entry to remove and the queue is not empty at the end of this 
instruction, RO is set to 1. If the interlock succeeded and the queue was not empty at the start 
of the removal, and the queue is empty after the removal, a 2 is returned in RO. If the instruc- 
tion fails to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) RO is set to -1. The value "N" is implementation dependent. \The selected initial 
value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals at the 
head or tail of the same queue by another process, in a multiprocessor environment. The 
removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue header and ele- 
ments are octaword aligned. No alignment or memory management checks are made before 
starting queue modifications to verify these requirements. Therefore, if any of these require- 
ments are not met, the queue may be left in an UNPREDICTABLE state and an illegal 
operand fault may be reported. 
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2.3.19 Remove Entry from Longword Queue 


Format: 
CALL_PAL REMQUEL ! PALcode format 


Operation: 


! R16 contains the address of the entry to remove 
! or the address of the 32 bit address of the 
! entry for REMQUEL/D 
! RO receives status: 
! -1 if the queue was empty 
! 0 if the queue is empty after removing an entry 
! 1 if the queue is not empty after removing an entry 
! Rl receives the address of the removed entry 
{ 
! Header and entries need only be byte aligned 
! Must have write access to header and queue entries 
IF opcode EQ REMQUEL/D THEN 
Rl « SEXT((R16)<31:0>) 
ELSE 
Rl « SEXT(R16<31:0>) 


IF {all memory accesses can be completed} THEN 


BEGIN 
tmpl < (R1)<31:0> ! Forward Link of Predecessor 
((R1+4)<31:0>)<31:0> < tmpl 
tmp2 <- (R1+4)<31:0> ! Backward Link of Successor 
((R1)<31:0>+4)<31:0> — tmp2 
RO < l ! Queue not empty 
IF {tmpl EQ tmp2} THEN 
RO < 0 ! Queue now empty 
IF {Rl EQ tmp2} THEN 
RO <¢ -1l ! Queue was empty 
END 
ELSE 
BEGIN 
{initiate fault} 
END 
END 
Exceptions: 


Access Violation 
Fault on Read 
Wate ~2. WET. 
Fauit O11 VV Iile 


Translation Not Valid 
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Instruction mnemonics: 


CALL_PAL REMQUEL Remove Entry from Longword Queue 
CALL_PAL REMQUEL/D Remove Entry from Longword Queue Deferred 


Description: 


REMQUEL removes the entry addressed by R16 from the longword absolute queue. The 
address of the removed entry is returned in R1. REMQUEL/D performs the same operation on 
the queue entry addressed by the longword addressed by R16. The queue header and entry 
need only be byte aligned. 


In either case, if there was no entry in the queue to be removed, RO is set to —1. If there was an 
entry to remove and the queue is empty at the end of this instruction, RO is set to 0. If there 
was an entry to remove and the queue is not empty at the end of this instruction, RO is set to 1. 
The removal is a non-interruptible operation. Before performing any part of the removal, the 
processor validates that the entire operation can be completed. This ensures that if a memory 
management exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.20 Remove Entry from Quadword Queue 


Format: 
CALL_PAL REMQUEQ ! PALcode format 


Operation: 


! R16 contains the address of the entry to remove 
! or address of address of entry for REMQUEQ/D 
! RO receives status: 
! -1 if the queue was empty 
! 0 if the queue is empty after removing an entry 
! 1 if the queue is not empty after removing an entry 
! Rl receives the address of the removed entry 
! Must have write access to header and queue entries 
! Header and entries must be octaword aligned 
IF opcode EQ REMQUEQ/D THEN 
IF {R16<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 
Rl ¢ (R16) 
ELSE 
Rl ¢ R16 
IF {R1<3:0> NE 0} THEN ! Check alignment 
BEGIN 
{illegal operand exception} 
END 
IF {all memory accesses can be completed} THEN 
BEGIN 
tmpl < (R1) ! Forward link of Predecessor 
IF {tmp1<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 
tmp2 < (R1+8) 
IF {tmp2<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END 
(tmp2) <« tmpl 
((R1)+8) < tmp2 


Check alignment 


Find predecessor 


Check alignment 


Update Forward link of predecessor 
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RO < 1 ! Queue not empty 
IF {tmpl EQ tmp2} THEN 
RO < 0 ! Queue now empty 
IF {Rl EQ tmp2} THEN 
RO ¢ -1l ! Queue was empty 
END 
ELSE 
BEGIN 
{initiate fault} 
END 
END 


Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Translation Not Valid 
Illegal Operand 


Instruction mnemonics: 


CALL_PAL REMQUEQ Remove Entry from Quadword Queue 
CALL_PAL REMQUEQ/D Remove Entry from Quadword Queue Deferred 


Description: 


REMQUEQ removes the queue entry addressed by R16 from the quadword absolute queue. 
The address of the removed entry is returned in Rl. REMQUEL/D performs the same opera- 
tion on the queue entry addressed by the quadword addressed by R16. 


In either case, if there was no entry in the queue to be removed, RO is set to —1. If there was an 
entry to remove and the queue is empty at the end of this instruction, RO is set to 0. If there 
was an entry to remove and the queue is not empty at the end of this instruction, RO is set to 1. 
The removal is a non-interruptible operation. Before performing any part of the removal, the 
processor validates that the entire operation can be completed. This ensures that if a memory 
management exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). RO 
and R1 are UNPREDICTABLE if an exception occurs. The relative order of reporting mem- 
ory management and illegal operand exceptions is UNPREDICTABLE. 
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2.4 Unprivileged VAX Compatibility PALcode Instructions 


The Alpha architecture provides the following PALcode instructions for use in translated 
VAX code. These instructions are not a permanent part of the architecture and will not be 
available in some future implementations. They are provided to help customers preserve VAX 
instruction atomicity assumptions in porting code from VAX to Alpha. These calls should be 
user mode. They must not be used by any code other than that generated by the VEST soft- 
ware translator and its supporting run-time code (TIE). 


\When they are removed from the architecture, it would be desirable if they trapped in a way 
that they could be functionally emulated in software for many years in the future, even if the 
atomicity is not retained in the software emulation. This would allow very old translated 
images to run in 1998 and beyond, but perhaps restricted to a single processor and with some 
restriction around AST delivery. 


They may be removed and not emulated after the first two full generations of Alpha implemen- 
tations, that is, about 1995. \ 
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2.4.1 Atomic Move Operation 


Format: 
AMOVRR ! PALcode format 
AMOVRM ! PALcode format 
Operation: 


! R16 contains the first source 
! R17 contains the first destination address 
! R18 contains the first length 
! R19 contains the second source 
! R20 contains the second destination address 
! R21 contains the second length 
CASE 
AMOVRR: 
IF intr flag EQ 0 THEN 
R18 < 0 
{return} 
END 


intr flag « 0 
(R17) <— R16 ! length specified by R18<1:0> 
(R20) < R19 ! length specified by R21<1:0> 
IF {both moves successful} THEN 
R18 ¢ 1 
ELSE 
R18 < 0 
END 
AMOVRM: 
IF intr flag EQ 0 THEN 
R18 < 0 
{return} 
END 


intr flag <« 0 
(R17) < R16 ! length specified by R18<1:0> 
IF R21<5:0> NE 0 THEN 
BEGIN 
IF R19<1:0> NE 0 OR R20<1:0> NE 0 
{Illegal operand exception} 
ELSE 
(R20) < (R19) ! length specified by R21<5:0> 
END 
IF {both moves successful} THEN 
R18 <« 1 
ELSE 
R18 < 0 
END 
ENDCASE 
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-Exceptions: 

AMOVRR: Access Violation 
Fault On Write 
Translation Not Valid 

AMOVRM: Access Violation 
Fault On Read 
Fault On Write 
Illegal Operand 
Translation Not Valid 


Instruction mnemonics: 


CALL_PAL AMOVRR Atomic Move Register/Register 
CALL_PAL AMOVRM Atomic Move Register/Memory 
Description: 


Note: 
The CALL_PAL AMOVxx instructions exist only for the support of translated VAX code. 
They will be removed from the architecture at some time in the future. They must be used 
only in translated VAX code and its support routines (TIE). 


CALL_PAL AMOVRR 

The CALL_PAL AMOVRR instruction specifies two multiprocessor-safe register stores to 
arbitrary byte addresses. Either both stores are done or neither store is done. R18 is set to 1 if 
both stores are done, and 0 otherwise. The two source registers are R16 and R19. The two des- 
tination byte addresses are in R17 and R20. The two lengths are specified in R18<1:0> and 
R21<1:0>. The length encoding is as follows: 00 is store byte, 01 is store word, 10 is store 
longword, 11 is store quadword. The low 1, 2, 4, or 8 bytes of the source register are used, 
respectively. The unused bytes of the source registers are ignored. The unused bits of the 
length registers (R18<63:2> and R21<63:2>) should be zero (SBZ). 


If, upon entry to the PALcode routine, the intr_flag is clear then the instruction sets R18 to 
zero and exits, doing no stores. Otherwise, intr_flag is. cleared and the PALcode routine pro- 
ceeds. This is the same per-processor intr_flag used by the RS and RC instructions. 


The AMOVRR memory addresses may be unaligned. If either store would result in a Transla- 
tion Not Valid fault, Fault on Write, or Access Violation fault, neither store is done and the 
corresponding fault is taken. If both stores would result in faults, it is UNPREDICTABLE 
which one is taken. 
Note: 

A fault does not set R18, since the instruction has not been completed. 
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If both stores can be completed without faulting, they are both attempted using multiproces- 
sor-safe LDQ_L..STQ_C sequences. If all the sequences store successfully with no 
interruption, the PALcode routine completes with R18 set to one. Otherwise, the PALcode rou- 
tine completes with R18 set to zero. In addition, R16, R17, R19, R20, and R21 are 
UNPREDICTABLE upon return from the PALcode routine, even if an exception has occurred. 


If the destinations overlap, the stores must appear to be done in the order specified. 


CALL_PAL AMOVRM 


The CALL_PAL AMOVRM instruction specifies one multiprocessor safe register store to an 
arbitrary byte address, plus an atomic memory-to-memory move of 0 to 63 aligned longwords. 
Either the store and the move are both done in their entirety or neither is done. R18 is set to 
one if both are done, and zero otherwise. 


The first source register is R16, the first destination address is in R17, and the first length is in 
R18. These three are specified exactly as in AMOVRR. 


The second source address is in R19, the second destination address is in R20, and the second 
length is in R21<5:0>. The length is a longword length, in the range 0 to 63 longwords (0 to 
252 bytes). The unused bytes of the source register R16 are ignored. The unused bits of the 
length registers (R18<63:2> and R21<63:6>) should be zero (SBZ). 


If, upon entry to the PALcode routine, the intr_flag is clear, the instruction sets R18 to zero 
and exits, doing no stores. Otherwise, intr_flag is cleared and the PALcode routine proceeds. 
This is the same per-processor intr_flag used by the RS and RC instructions. 


The memory address in R17 may be unaligned. 


If the length for the move is zero, no move is done, no memory accesses are made via R19 and 
R20, and no fault checking of these addresses is done. In this case, the move is always consid- 
ered to have succeeded in determining the setting of R18. 


If the length in R21 is non-zero, the two addresses in R19 and R20 must be aligned longword 
addresses; otherwise, an Illegal Operand exception is taken. 


If either the store or the move would result in a Translation Not Valid, Fault on Read, Fault on 
Write, or Access Violation fault, neither is done and the corresponding fault is taken. If both 
would result in faults, it is UNPREDICTABLE which one is taken. 


Note: 
A fault does not set R18, since the instruction has not been completed. 


If both the store and the move can be completed without faulting, they are both attempted, 
using multiprocessor-safe LDQ_L..STQ_C sequences for the store. If all the operations store 
successfully with no interruption, the PALcode routine completes with R18 set to one. Other- 
wise, the PALcode routine completes with R18 set to zero. In addition, R16, R17, R19, R20, 
and R21 are UNPREDICTABLE upon return from the PALcode routine, even if an exception 
has occurred. 
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If the memory fields overlap, the store must appear to be done first, followed by the move. 
The ordering of the reads and writes of the move is unspecified. Thus, if the move destination 
overlaps the move source, the move results are UNPREDICTABLE. 


These instructions contain no implicit MB. 
Notes: 


¢ Typically, these instructions would be used in a sequence starting with CALL_PAL RS 
and ending with CALL_PAL AMOVxx, Bxx R18,label. The failure path from the con- 
ditional branch would eventually go back to the RS instruction. When such a sequence 
succeeds, it has done everything from the RS up to and meus the CALL_PAL 
AMOVxx completely with no interrupts or exceptions. 


¢ The CALL_PAL AMOV«xx instruction is typically followed by a conditional branch on 
R18. If the CALL_PAL AMOV*xx is likely to succeed, the conditional branch should be 
a forward branch on failure (BEQ R18,forward_label) or backward branch on success 
(BNE R18, backward_label), to match the architected branch-prediction rule. 


e The CALL_PAL AMOV«xx instruction must either do both stores or neither. If R18=0 
upon return, then memory state must be unchanged. If the first STQ_C inside 
AMOVRR succeeds (and thus has changed programmer-visible state in memory), the 
PALcode routine must complete the second STQ_C also, and exit with R18=1. In par- 
ticular, if the failure loop around the second STQ_C is executed an excessive number of 
times (due to perverse interference from another processor), the PALcode may not 
"give up" and return with R18=0. 


e \To balance the desires for reasonable Alpha interrupt latency and VAX architectural 
correctness, an ordinary store or a machine check after 32—256 iterations (larger num- 
bers for more than four SMP processors) is suggested.\ 
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2.5 Unprivileged PALcode Thread Instructions 


The PALcode thread instructions provide support for multithread implementations, which 
require that a given thread be able to generate a reproduciable unique value in a "timely" fash- 
ion. This value can then be used to index into a structure or otherwise generate additional 
thread unique data. 


The two instructions in Table 2—4 are provided to read and write a process unique value from 
the process’s hardware context. 


Table 2-4: Unprivileged PALcode Thread Instructions 


Mnemonic Operation 


READ_UNQ Read unique context 
WRITE_UNQ Write unique context 


The process-unique value is stored in the HWPCB at [HWPCB+72] when the process is not 
active. When the process is active, the process unique value can be cached in hardware inter- 
nal storage or reside in the HWPCB only. 
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2.5.1 Read Unique Context 


Format: 


CALL_PAL READ_UNQ ! PALcode format 


Operation: 
IF {internal storage for process unique context} THEN 
RO «< {process unique context} 


ELSE 
RO < (HWPCBt72) 


Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL READ UNQ Read Unique Context 


Description: 


The READ_UNQ instruction causes the hardware process (thread) unique context value to be 
placed in RO. If this value has not previously been written using a CALL_PAL WRITE_UNQ 
or stored into the quadword in the HWPCB at [HWPCB+72] while the thread was inactive, 
the result returned in RO is UNPREDICTABLE. Implementations can cache this unique con- 
text value while the hardware process is active. The unique context may be thought of as a 
"slow register." Typically, this value will be used by software to establish a unique context for 
a given thread of execution. 
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2.5.2 Write Unique Context 


Format: 
CALL_PAL WRITE_UNQ ! PALcode format 


Operation: 


!R16 contains value to be written to the hardware process 
! unique context 


IF {internal storage for process unique context} THEN 
{process unique context} < R16 

ELSE 
(HWPCB+72) < R16 


Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL WRITE_UNQ Write Unique Context 


Description: 


The WRITE_UNQ instruction causes the value of R16 to be stored in internal storage for hard- 
ware process (thread) unique context, if implemented, or in the HWPCB at [HWPCB+72], if | 
the internal storage is not implemented. When the process is context switched, SWPCTX 


ensures that this value is stored in the HWPCB at [HWPCB+72]. Implementations can cache 
this unique context value in internal storage while the hardware process is active. The unique 


context may be thought of as a "slow register." Typically, this value will be used by software 


to establish a unique context for a given thread of execution. 
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2.6 Privileged PALcode Instructions 


Privileged instructions can be called in kernel mode only; otherwise, a privileged instruction 
exception occurs. The following privileged instructions are provided: 


Table 2-5: PALcode Privileged Instructions Summary 


Mnemonic Operation 

CFLUSH Cache flush 

CSERVE Console service 

DRAINA Drain abort. See Common Architecture, Chapter 6. — 
HALT Halt processor. See Common Architecture, Chapter 6. 
LDQP Load quadword physical 

MFPR Move from processor register | 

MTPR Move to processor register 

STQP Store quadword physical 

SWPCTX Swap privileged context 

SWPPAL Swap PALcode image 
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2.6.1 Cache Flush 


Format: 
CALL_PAL CFLUSH ! PALcode format 


Operation: 


! R16 contains the Page Frame Number (PFN) 
! of the page to be flushed 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 


{Flush page out of cache(s)} 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL CFLUSH Cache Flush 


Description: 


The CFLUSH instruction may be used to flush an entire physical page specified by the PFN in 
R16 from any data caches associated with the current processor. All processors must imple- 
ment this instruction. 


On processors that implement a backup power option that maintains only the contents of mem- 
ory during a powerfail, this instruction is used by the powerfail interrupt handler to force data 
written by the handler to the battery backed-up main memory. After a CFLUSH, the first sub- 
sequent load (on the same processor) to an arbitrary address in the target page is either fetched 
from physical memory or from the data cache of another processor. 


In some multiprocessor systems, CFLUSH is not sufficient to ensure that the data are actually 
written to memory and not exchanged between processor caches. Additional platform-specific 
cooperation between the powerfail interrupt handlers executing on each processor may be 
required. 


On systems that implement other backup power options (including none), CFLUSH may 
return without affecting the data cache contents. To order CFLUSH properly with respect to 
preceding writes, an MB instruction is needed before the CFLUSH; to order CFLUSH prop- 
erly with respect to subsequent reads, an MB instruction is needed after the CFLUSH. 


\On systems that implement some form of NVRAM, PALcode must ensure that the given page 
is in memory before returning to the caller, regardless of the backup power option (including 
none).\ 


DIGITAL Restricted Distribution | 
PALcode Instruction Descriptions (I-A) 2-83 


2.6.2 Console Service 


Format: 
CALL _PAL CSERVE ! PALcode format 


Operation: 
! Implementation specific 


IF PS<CM> NE 0 THEN 
{Privileged instruction exception} 


ELSE 
{Implementation-dependent action} 


Exceptions: 


Privileged Instruction 


Instruction Mnemonics: 


CALL_PAL CSERVE Console Service 


Description: 


This instruction is specific to each PALcode and console implementation and is not intended for 
operating system use. 


\The console can implement the generic console I/O routines by using CSERVE to enter and 
exit console I/O mode. The CSERVE instruction is primarily used in the generic console I/O 
callback routines for virtual-to-physical address translation. Since the PALcode image used by 
the operating system can differ from that used by the console, the console might not have 
direct knowledge of the active memory management policy. Therefore, the console uses 
CSERVE to get the physical address. \ 
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2.6.3 Load Quadword Physical 


Format: 
CALL_PAL LDQP ! PALcode format 


Operation: 


! R16 contains the quadword-aligned physical address 
! RO receives the data from memory 


IF PS<CM> NE 0 THEN 
{Privileged Instruction exception} 


RO < (R16) {physical access} 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL LDQP Load Quadword Physical 


Description: 


The LDQP instruction fetches and writes to RO the quadword-aligned memory operand, whose 
physical address is in R16. 


If the operand address in R16 is not quadword aligned, the result is UNPREDICTABLE. 
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2.6.4 Move from Processor Register 


Format: 
CALL_PAL MFPR_IPR_Name | ! PALcode format 


Operation: 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 


! R16 may contain an IPR specific source operand 
RO < result of IPR specific function 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL PAL MFPR_xxx Move from Processor Register xxx 


Description: 


The MFPR_xxx instruction reads the internal processor register specified by the PALcode. 
function field and writes it to RO. 


Registers R1, R16, and R17 contain UNPREDICTABLE results after an MFPR. 


See Chapter 5 for a description of each IPR. 
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2.6.5 Move to Processor Register 


Format: 


CALL_PAL MTPR_IPR_Name ! PALcode format 


Operation: 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 
! R16 may contain an IPR specific source operand 


RO < result of IPR specific function 
IPR ¢ result of IPR specific function 
Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL MTPR_xxx Move to Processor Register xxx 


Description: 
The MTPR_xxx instruction writes the IPR-specific source operands in integer registers R16 
and R17 (R17 reserved for future use) to the internal processor register specified by the PAL- 


code function field. The effect produced by loading a processor register is guaranteed to be 
active on the next instruction. 


Registers R1, R16, and R17 contain UNPREDICTABLE results after an MTPR. The MTPR 
may return results in RO. If the specific IPR being accessed does not return results in RO, then 


RO contains an UNPREDICTABLE result after an MTPR. 


See Chapter 5 for a description of each IPR. 
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2.6.6 Store Quadword Physical 


Format: 
CALL_PAL STQP ! PALcode format 


Operation: 


! R16 contains the quadword aligned physical address 
! R17 contains the data to be written 


IF PS<CM> NE 0 then 
{Privileged Instruction exception} 


(R16) < R17 {physical access} 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 
CALL_PAL STQP Store Quadword Physical 


Description: 


The STQP instruction writes the quadword contents of R17 to the memory location whose 
physical address is in R16. 


If the operand address in R16 is not quadword aligned, the result is UNPREDICTABLE. 
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2.6.7 Swap Privileged Context 


Format: 
CALL_PAL SWPCTX ! PALcode format 


Operation: 
! R16 contains the physical address of the new HWPCB. 


! check HWPCB alignment 


IF R16<6:0> NE 0 THEN 
{reserved operand exception} 
IF {PS<CM> NE 0} THEN 
{privileged instruction exception} 


! Store old HWPCB contents 


(IPR_PCBB + HWPCB_KSP) < SP 
IF {internal registers for stack pointers} THEN 
BEGIN 
(IPR_PCBB + HWPCB ESP) <« IPR_ESP 
(IPR_PCBB + HWPCB SSP) < IPR_SSP 
(IPR_PCBB + HWPCB USP) « IPR_USP 
END 


IF {internal registers for ASTxx} THEN 
BEGIN 
(IPR_PCBB + HWPCB ASTSR) <- IPR_ASTSR 
(IPR_PCBB + HWPCB ASTEN) < IPR_ASTEN 
END 
tmp1 «— PCC 
tmp2 < ZEXT(tmp1<31:0>) 
tmp3 < ZEXT(tmp1<63:32>) 
(IPR_PCBB + HWPCB PCC) ¢ {tmp2 + tmp3}<31:0> 
IF {internal storage for process unique value} THEN 
BEGIN 
(IPR_PCBB + HWPCB UNQ) <- process unique value 
END 


! Load new HWPCB contents 
IPR_PCBB < R16 


IF {ASNs not implemented in virtual instruction cache} THEN 
{flush instruction cache} 
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IF {ASNs not implemented in TB} THEN 
IF {IPR_PTBR NE (IPR_PCBB + HWPCB PTBR)} THEN 
{invalidate trans. buffer entries with PTE<ASM> EQ 0} 
ELSE 
IPR ASN « (IPR_PCBB + HWPCB ASN) 


SP «< (IPR_PCBB + HWPCB KSP) 
IF {internal registers for stack pointers} THEN 
BEGIN 
IPR_ESP < (IPR_PCBB + HWPCB ESP) 
IPR SSP < (IPR_PCBB + HWPCB SSP) 
IPR_USP « (IPR_PCBB + HWPCB USP) 
END 


IPR_PTBR <-— (IPR_PCBB + HWPCB PTBR) 


IF {internal registers for ASTxx} THEN 
BEGIN 
IPR_ASTSR < (IPR_PCBB + HWPCB ASTSR) 
IPR_ASTEN < (IPR _PCBB + HWPCB ASTEN) 
END 


IPR_FEN < (IPR_PCBB + HWPCB FEN) 
tmp4 < ZEXT((IPR_PCBB + HWPCB PCC)<31:0>) 
tmp4 < tmp4 - tmp2 
PCC<63:32> « tmp4<31:0> 
IF {internal storage for process unique value} THEN 
BEGIN 
process unique value <—- (IPR_PCBB + HWPCB_UNQ) 
END 
IF {internal storage for Data Alignment trap setting} THEN 
BEGIN 
DAT < (IPR_PCBB + HWPCB DAT) 
END 


Exceptions: 


Reserved Operand 
Privileged Instruction 


Instruction mnemonics: 


CALL_PAL SWPCTX Swap Privileged Context 


Description: 


The SWPCTX instruction returns ownership of the current Hardware Privileged Context 
Block (HWPCB) to the operating system and passes ownership of the new HWPCB to the pro- 
cessor. The HWPCB is described in Chapter 4. 


SWPCTX saves the privileged context from the internal processor registers into the HWPCB 
specified by the physical address in the PCBB internal processor register. It then loads the 
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privileged context from the new HWPCB specified by the physical address in R16. The actual 
sequence of the save and restore operation is not specified, so any overlap of the current and 
new HWPCB storage areas produces UNDEFINED results. 


The privileged context includes the four stack pointers, the Page Table Base Register (PTBR), 
the Address Space Number (ASN), the AST enable and summary registers, the Floating-point 
Enable Register (FEN), the Performance Monitor (PME) register, the Data Alignment Trap 
(DAT) register, and the Charged Process Cycles — the number of PCC register counts that are 
charged to a process (modulo 2**32). 


PTBR is never saved in the HWPCB and it is UNPREDICTABLE whether or not ASN is 
saved. These values cannot be changed for a running process. The process integer and floating 
registers are saved and restored by the operating system. See Figure 4—1 for the HWPCB 
format. 


Notes: 


e Any change to the current HWPCB while the processor has ownership results in 
UNDEFINED operation. 


° All the values in the current HWPCB can be read through IPRs, except the Charged 
Process Cycles. 


e If the HWPCB is read while ownership resides with the processor, it is UNPREDICT- 
ABLE whether the original or an updated value of a field is read. The processor can 
update an HWPCB field at any time. The decision as to whether or not a field is 
updated is made individually for each field. 


e If the enabling conditions are present for an interrupt at the completion of this instruc- 
tion, the interrupt occurs before the next instruction. 


¢ PALcode sets up the PCBB at boot time to point to the HWPCB storage area in the 
Hardware Restart Parameter Block (HWRPB). See Console Interface (III), Chapter 2. 


e The operation is UNDEFINED if SWPCTX accesses a non-memory-like region. 


e A reference to nonexistent memory causes a machine check. Unimplemented physical 
address bits are SBZ. The operation is UNDEFINED if any of these bits are set. 


Note: 


Processors may keep a copy of each of the per-process stack pointers in internal registers. 
In those processors, SWPCTX stores the internal registers into the HWPCB. Processors 
that do not keep a copy of the stack pointers in internal registers keep only the stack 
pointer for the current access mode in SP and switch this with the HWPCB contents 
whenever the current access mode changes. 
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2.6.8 Swap PALcode Image 


Format: 


CALL_PAL SWPPAL ! PALcode format 


Operation: 


! R16 contains the new PALcode identifier 

! R17-R21 contain implementation-specific entry parameters 
! RO receives status: 

! 0 Success (PALCode was switched) 

! 1 Unknown PALcode variant 

! 2 Known PALcode variant, but PALcode not loaded 


IF (PS<CM> NE 0) then , 
{Privileged instruction exception} 


ELSE 
IF {R16 < 256} THEN 
BEGIN 
IF {R16 invalid} THEN 
RO < 1 
{Return} 
ELSE IF {PALcode not loaded} THEN 
RO < 2 
{Return} 
ELSE 
tmpl <- {PALcode base} 
END 
ELSE 
tmpl = R16 
{Flush instruction cache} 
{Invalidate all translation buffers} 
{Perform additional PALcode variant-specific initialization} 
{Transfer control to PALcode entry at physical address in tmpl} 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL SWPPAL Swap PALcode Image 


Description: 


The SWPPAL instruction causes the current (active) PALcode to be replaced by the specified 
new PALcode image. This instruction is intended for use by operating systems only during 
bootstraps and by consoles during transitions to console I/O mode. 
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The PALcode descriptor contained in R16 is interpreted as either a PALcode variant or the 
base physical address of the new PALcode image. If a variant, the PALcode image must have 
been previously loaded. No PALcode loading occurs as a result of this instruction. 


After successful PALcode switching, the register contents are determined by the parameters 
passed in R17 through R21 or are UNPREDICTABLE. A common parameter is the address of 
anew HWPCB. In this case, the stack pointer register and PTBR are determined by the con- 
tents of that HWPCB; the contents of other registers such as R16 through R21 may be 
UNPREDICTABLE. 


See Console Interface Architecture (III), for information on using this instruction. 
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2.6.9 Wait for Interrupt 


Format: 
CALL_PAL WTINT ! PALcode format 


Operation: 


! R16 contains the maximum number of interval clock ticks to skip 
! RO receives the number of interval clock ticks actually skipped 


IF (implemented) 
BEGIN 
IF {Implementation supports skipping multiple 
clock interrupts} THEN 
{Ticks to skip <R16} 


{Wait no longer than any non-clock interrupt or the first clock 
interrupt after ticks to skip ticks have been skipped} 


IF {Implementation supports skipping multiple} 
{clock interrupts} THEN 
RO «number of interval clock ticks actually skipped 
ELSE 
RO <-0 
END 
ELSE 
RO <0 
{return} 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL WTINT Wait for Interrupt 


Description: 


The WTINT instruction requests that, if possible, the PALcode wait for the first of either of 
the following conditions before returning: 


e Any interrupt other than a clock tick 
¢ = The first clock tick after a specified number of clock ticks has been skipped 


The WTINT instruction returns in RO the number of clock ticks that are skipped. The number 
returned in RO is zero on hardware platforms that implement this instruction, but where it is 
not possible to skip clock ticks. 
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The operating system can specify a full 64-bit integer value in R16 as the maximum number 
of interval clock ticks to skip. A value of zero in R16 causes no clock ticks to be skipped. 


Note the following if specifying in R16 the maximum number of interval clock ticks to skip: 


e Adherence to a specified value in R16 is at the discretion of the PALcode; the PALcode 
may complete execution of WTINT and proceed to the next instruction at any time up 
to the specified maximum, even if no interrupt or interval-clock tick has occurred. That 
is, WTINT may return before all requested clock ticks are skipped. 


¢ The PALcode must complete execution of WTINT if an interrupt occurs or if an inter- 
val-clock tick occurs after the requested number of interval-clock ticks has been 
skipped. 


In a multiprocessor environment, only the issuing processor is affected by an issued WTINT 
instruction. The counters, SCC and PCC, may increment at a lower rate or may stop entirely 
during WTINT execution. This side effect is implementation dependent. 
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2.7 \Revision History 


Revision 7.0, November 13, 1997 


1. 


es Cate ae, 


Added ECO 99, Extended VA 

Added ECO 91, WTINT instruction 

Added ECO 80, R16 to BUGCHK instruction 
Added ECO 101, CLRFEN instruction 

Alpha AXP ——> Alpha 

OpenVMS AXP ——-> OpenVMS Alpha 


Revision 6.0, December 12, 1994 


i 


2 
3 
4 


Alpha —->Alpha AXP 

Added ECO 64, AMOV*«xx clarification 
Added ECO 52, CFLUSH instruction 
Added CSERVE and SWPPAL instructions 


Revision 5.0, May 12, 1992 


1. 


5 RR! ON SP a eS 


ee ee a 
Nn fF WO NY FY O&O 


Changed attempt to acquire secondary lock to retry value 
Modified RSCC and CFLUSH descriptions 

Removed DRAINA to common PAL chapter 

Added ECO #29 GENTRAP 

Added ECO #27 (octaword aligned queues) 

Added secondary interlock information 

Added ECO #31 & #44 (AMOVxx PALcode instructions) 
Added format editing for instructions 


Added resident Queue Instructions ECO #28 


. IMB and HALT moved to Common PALcode Section 

. Removed priv inst tests from RSCC (an unpriv instruction) 
. Clean up the format for instructions 

. Converted to SDML 

. Added ECO #21, #23, #26 

. Identify queue type, for Queue instructions 

. Modified REI pseudocode 

ep 


Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


I. 


Put in ECO for PAL Thread Instructions 
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i, 


12. 
13. 
14. 
1S: 


Put in eco requiring current stack be writable for REI instruction 

Put in eco requiring REMQUEx/D to return address of removed entry in R1 
Typos 

Correct cross reference to section ‘Replacement of standard PALcode’ 
Impose uniform usage of CASE pseudocode construct 

Clarify use of R17 and RO or MTPR instruction 

Specify R16 and R17 as integer registers for MTPR instruction 


Replace occurrences of ‘Reserved Operand Exception’ with ‘Illegal PALcode Operand 
Trap’ 


. Clarify that subsettable unprivileged PAL Instructions can individually either be imple- 


mented or cause an Illegal Instruction Trap 


Change references from ‘interrupt’ to ‘AST’ in SWASTEN description, and to ‘inter- 
rupt or AST’ in REI description 


Add Privileged Instruction exception to those experienced by CFLUSH and DRAINA 
Correct inconsistent titles for INSQHIQ and INSQTIQ Instructions 

Tweak MFPR_IPR operation definition 

Add ‘Read System Cycle Counter’ PALcode description 


Revision 3.0, March 2, 1990 


1. 


Oe Be 


Fix Bug in /D version of REMQUEx and INSQUEx 
Add stack fixup to REI | 

Add Memory Barrier to interlocked queues 

Add section on replacement of PALcode 

Add Cflush 

Rework IFLUSH to IMB 

Remove PAST 


- Define which PAL may be subsetted 


Revision 2.0, October 4, 1989 


1. 


Pt OS oh, ae Hoe HE 


Remove test and set/clear interlocked 

Add deferred addressing to the absolute queues 
Add drain aborts (DRAINA) 

Add poll AST (PAST) 

Remove read/write of inexact exception enable 
Add CC and FEN to SWPCTX 

Rework interlocked queues for LDQ/L and STQ/C 
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Revision 1.0, May 23, 1989 
1. First Full Version 


Revision 0.0, March 15, 1989 


1. Initial Version\ 
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Chapter 3 


Memory Management (II-A) 


3.1 Introduction 


Memory management consists of the hardware and software that control the allocation and use 
of physical memory. Typically, in a multiprogramming system, several processes may reside 
in physical memory at the same time (see Chapter 4). OpenVMS Alpha uses memory protec- 
tion and multiple address spaces to ensure that one process will not affect other processes or 
the operating system. 


To further improve software reliability, four hierarchical access modes provide memory 
access control. They are, from most to least privileged: kernel, executive, supervisor, and user. 
Protection is specified at the individual page level, where a page may be inaccessible, 
read-only, or read/write for each of the four access modes. Accessible pages can be restricted 
to have only data or instruction access. 


A program uses virtual addresses to access its data and instructions. However, before these vir- 
tual addresses can be used to access memory, they must be translated into physical addresses. 
Memory management software maintains hierarchical tables of mapping information (page 
tables) that keep track of where each virtual page is located in physical memory. The proces- 
sor utilizes this mapping information when it translates virtual addresses to physical addresses. 


Therefore, memory management provides mechanisms for both memory protection and mem- 
ory mapping. The OpenVMS Alpha memory management architecture is designed to meet 
several goals: 


e Provide a large address space for instructions and data 


¢ Allow programs to run on hardware with physical memory smaller than the virtual 
memory used 


¢ Provide convenient and efficient sharing of instructions and data 
e Allow sparse use of a large address space without excessive page table overhead 
¢ Contribute to software reliability 


e Provide independent read and write access protection 


3.2. Virtual Address Space 


A virtual address is a 64-bit unsigned integer that specifies a byte location within the virtual 
address space. Implementations subset the address space supported to one of several sizes, as a 
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function of page size and page table depth. The minimal virtual address size supported is 43 
bits. If an implementation supports less than 64-bit virtual addresses, it must check that all the 
VA<63:VA_SIZE> bits are equal to VA<VA_SIZE-1>. That gives two disjoint ranges for 
valid virtual addresses. For example, for a 43-bit virtual address space, valid virtual address 
ranges are 0...3FF FFFF FFFF,¢ and FFFF FC00 0000 0000, ¢...FFFF FFFF FEFF FFFF 6. 
Accesses to virtual addresses outside of the valid virtual address ranges for an implementation 
cause an access violation exception. 


The virtual address space is broken into pages, which are the units of relocation, sharing, and 
protection. The page size ranges from 8K bytes to 64K bytes. System software should, there- 
fore, allocate regions with differing protection on 64K-byte virtual address boundaries to 
ensure image compatibility across all Alpha implementations. 

Memory management provides the mechanism to map the active part of the virtual address 
space to the available physical address space. The operating system controls the vir- 


tual-to-physical address mapping tables and saves the inactive parts of the virtual address 
space on external storage media. 


3.3 Virtual Address Format 


The processor generates a 64-bit virtual address for each instruction and operand in memory. 
The virtual address consists of three or four level-number fields and a byte_within_page field, 
as shown in Figures 3—1 and 3-2. 


Figure 3-1: Virtual Address Format, Three-Level Mode 


63 M 0 


Figure 3-2: Virtual Address Format, Four-Level Mode 


* LevelO0 <M:L+1> contains SEXT(VA<L>), where L is the highest numbered implemented VA bit. 





The byte_within_page field can be either 13, 14, 15, or 16 bits depending on a particular 
implementation. Thus, the allowable page sizes are 8K bytes, 16K bytes, 32K bytes, and 64K 
bytes. Each level-number field contains n bits, where n is, for example, 10 with an 8K-byte 
page size. The level-number fields are the same size for a given implementation. 


Implementations must support a mode of operation such that the virtual address format con- 
sists of at least three level-number fields (Levell, Level2, Level3) and a byte_within_page 
field. Optionally, implementations may support an extended mode of operation, such that the 
virtual address format includes a fourth level-number field, Level0. Determination of a three- 
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versus four-level mode of operation occurs during system bootstrap. The selected mode affects 
all processes identically and remains in effect until the next bootstrap. 


An implementation that supports the fourth level-number field may further subset the sup- 
ported address space to include only a subset of low-order bits within that field. That subset 
must be at least two bits!, and may be as large as n bits, where n is the full bit count of any 
given level-number field. The most significant bit in the chosen subset is sign-extended to 
VA<63> for any valid virtual address. 


The level-number fields are a function of the page size; all page table entries at any given level 
do not exceed one page. The PFN field in the PTE is always 32 bits wide. Thus, as the page 
size grows, the virtual and physical address size also grows (Table 3-1). 


Table 3-1: Virtual Address Options 


Page Size Byte Offset Level Size Virtual Address Physical Address 


(bytes) (bits) (bits) (bits!) (bits) 
8K 13 10 43, 45-53 45 
16K 14 11 47, 49-58 46 
32K 15 12 51, 53-63 47 
64K 16 13 55, 57-647 48 


! Bit counts for three, or four levels, respectively (VA_SIZE) 
> Level 0 page table not fully utilized for this page size. 


3.4 Physical Address Space 


Physical addresses are at most 48 bits. A processor may choose to implement a smaller physi- 
cal address space by not implementing some number of high-order bits. 


The two most significant implemented physical address bits delineate the four regions in the 
physical address space. Implementations use these bits as appropriate for their systems. For 
example, in a workstation with a 30-bit physical address space, bit <29> might select between 
memory and non-memory-like regions, and bit <28> could enable or disable cacheing. See 
Common Architecture (I), Chapter 5. 


3.5 Memory Management Control 


Memory management is always enabled. Implementations must provide an environment for 
PALcode to service exceptions and to initialize and boot the processor. For example, PALcode 
might run with [-stream mapping disabled and use the privileged CALL_PAL LDQP and 
STQP instructions to access data stored in physical addresses. 


1 OpenVMS requires at least three PTEs in the highest-level page table. The lowest-order PTE must 
map process space, the highest-order PTE must map system space, and the penultimate PTE maps the 
page table structure. See Section 3.8.2. 
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3.6 Page Table Entries 


The processor uses a quadword Page Table Entry (PTE), as shown in Figure 3-3, to translate 
virtual addresses to physical addresses. A PTE contains hardware and software control infor- 
mation and the physical Page Frame Number. 


Figure 3-3: Page Table Entry 


63 32 31 16 151413121110 








Reserved 
PFN for 
Software 


Fields in the page table entry are interpreted as shown in Table 3-2. 
Table 3-2: Page Table Entry 


Bits Description 


63-32 Page Frame Number (PFN) 
The PEN field always points to a page boundary. If V is set, the PFN is concate- 
nated with the byte_within_page bits of the virtual address to obtain the physical 
address (see Section 3.8). If V is clear, this field may be used by software. 


31-16 Reserved for software. 


15 User Write Enable (UWE) 
This bit enables writes from user mode. If this bit is a0 and a STORE is attempted 
while in user mode, an Access Violation occurs. This bit is valid even when V=0. 


Note: 


If a write-enable bit is set and the corresponding read-enable bit is 
not, the operation of the processor is UNDEFINED. 


14 Supervisor Write Enable (SWE) 
This bit enables writes from supervisor mode. If this bit is a 0 and a STORE is 
attempted while in supervisor mode, an Access Violation occurs. This bit is valid 
even when V=0. . 


13 Executive Write Enable (EWE) 
This bit enables. writes from executive mode. If this bit is a 0 and a STORE is 
attempted while in executive mode, an Access Violation occurs. This bit is valid 
even when V=0. 


12 Kernel Write Enable (K WE) 
This bit enables writes from kernel mode. If this bit is a 0 and a STORE is 
attempted while in kernel mode, an Access Violation occurs. This bit is valid even 
when V=0. 


11 User Read Enable (URE) 
This bit enables reads from user mode. If this bit is a0 and a LOAD or instruction 
fetch is attempted while in user mode, an Access Violation occurs. This bit is 
valid even when V=0. 
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Table 3-2: Page Table Entry (Continued) 


Bits Description 


10 Supervisor Read Enable (SRE) 
This bit enables reads from supervisor mode. If this bit is a 0 and a LOAD or 
instruction fetch is attempted while in supervisor mode, an Access Violation 
occurs. This bit is valid even when V=0. 


9 Executive Read Enable (ERE) 
This bit enables reads from executive mode. If this bit is a 0 and a LOAD or 
instruction fetch is attempted while in executive mode, an Access Violation 
occurs. This bit is valid even when V=0. 


8 Kernel Read Enable (KRE) 
This bit enables reads from kernel mode. If this bit is a 0 and a LOAD or instruc- 
tion fetch is attempted while in kernel mode, an Access Violation occurs. This bit 
is valid even when V=0. 


f Reserved for future use by DIGITAL. 


Programming Note: 
The reserved bit will be used by future hardware systems and should 
not be used by software even if PTE<V> is clear. 


6—5 Granularity hint (GH) 
Software may set these bits to a non-zero value to supply a hint to translation 
buffer implementations that a block of pages can be treated as a single larger page: 


1. The block is an aligned group of 8**N pages, where N is the value of 
PTE<6:5>, that is, a group of 1, 8, 64, or 512 pages starting at a virtual 
address with page_size + 3*N low-order zeros. 


2. The block is a group of physically contiguous pages that are aligned both 
virtually and physically. Within the block, the low 3*N bits of the PFNs 
describe the identity mapping and the high 32-3*N PEN bits are all equal. 


3. Within the block, all PTEs have the same values for bits <15:0>, that is, 
protection, fault, granularity, and valid bits. 


Hardware may use this hint to map the entire block with a single TB 
entry, instead of 8, 64, or 512 separate TB entries. 


It is UNPREDICTABLE which PTE values within the block are used if 
the granularity bits are set inconsistently. 


Programming Note: 
A granularity hint might be appropriate for a large memory structure 
such as a frame buffer or nonpaged pool that, in fact, is mapped into 
contiguous virtual pages with identical protection, fault, and valid 
bits. 


4 Address Space Match (ASM) 
When set, this PTE matches all Address Sits Numbers. For a given VA, ASM 
must be set consistently in all processes; otherwise, the address mapping is 
UNPREDICTABLE. 
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Table 3-2: Page Table Entry (Continued) 


Bits Description 


3 Fault on Execute (FOE) 
When set, a Fault on Execute exception occurs on an attempt to execute an 
instruction in the page. 


2 Fault on Write (FOW) 
When set, a Fault on Write exception occurs on an attempt to write any location in 
the page. 


1 Fault on Read (FOR) 
When set, a Fault on Read exception occurs on an attempt to read any location in 
the page. 


0 Valid (V) 
Indicates the validity of the the PFN field. When V is set, the PFN field is valid for 
use by hardware. When V is clear, the PFN field is reserved for use by software. 
The V bit does not affect the validity of PTE<15:1> bits. 


3.6.1 Changes to Page Table Entries 


The operating system changes PTEs as part of its memory management functions. For exam- 
ple, the operating system may set or clear the valid bit, change the PFN field as pages are 
moved to and from external storage media, or modify the software bits. The processor hard- 
ware never changes PTEs. 


Software must guarantee that each PTE is always internally consistent. Changing a PTE one 
field at a time may give incorrect system operation, for example, setting PTE<V> with one 
instruction before establishing PTE<PFN> with another. Execution of an interrupt service rou- 
tine between the two instructions could use an address that would map using the inconsistent 
PTE. Software can solve this problem by building a complete new PTE in a register and then 
moving the new PTE to the page table using a Store Quadword instruction (STQ). 


Multiprocessing complicates the problem. Another processor could be reading (or even chang- 

ing) the same PTE that the first processor is changing. Such concurrent access must produce 

consistent results. Software must use some form of software synchronization to modify PTEs 

that are already valid. Once a processor has modified a valid PTE, it is possible that other pro- 

cessors in a multiprocessor system may have old copies of that PTE in their Translation 
- Buffer. Software must notify other processors of changes to PTEs. 


Software may write new values into invalid PTEs using quadword store instructions (STQ). 
Hardware must ensure that aligned quadword reads and writes are atomic operations. The fol- 
lowing procedure must be used to change any of the PTE bits <15:0> of a shared valid PTE 
(PTE<0>=1) such that an access that was allowed before the change is not allowed after the 
change. 


1. The PTE<0> is cleared without changing any of the PTE bits <63:32> and <15:1>. 


2. All processors do a TBIS for the VA mapped by the PTE that changed. The VA used in 
the TBIS must assume that the PTE granularity hint bits are zero. 
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3. After all processors have done the TBIS, the new PTE may be written changing any or 
all fields. 


Programming Note: 


The procedure above allows queue instructions that have probed in order to check that all 
can complete, to service a TB miss. The queue instructions use the PTE even though the V 
bit is clear, if the V bit was set during the instruction’s initial probe flow. 


3.7 Memory Protection 


Memory protection is the function of validating whether a particular type of access is allowed 
to a specific page from a particular access mode. Access to each page is controlled by a protec- 
tion code that specifies, for each access mode, whether read or write references are allowed. 
The processor uses the following to determine whether an intended access is allowed: 

e The virtual address, which is used to index page tables 

e The intended access type (read data, write data, or instruction fetch) 

e The current access mode from the Processor Status 
If the access is allowed and the address can be mapped (the Page Table Entry is valid), the 


result is the physical address that corresponds to the specified virtual address. 


For protection checks, the intended access is read for data loads and instruction fetch, and 
write for data stores. 


If an operand is an address operand, then no reference is made to memory. Hence, the page 
need not be accessible nor map to a physical page. 


3.7.1 Processor Access Modes 


There are four processor modes: 
e =6Kernel 
e =6Executive 
e =©Supervisor 
e User 


The access mode of a running process is stored in the Current Mode bits of the Processor Sta- 
tus (PS) (see Section 6-2). 


3.7.2 Protection Code 


Every page in the virtual address space is protected according to its use. A program may be 
prevented from reading or writing portions of its address space. Each page has an associated 
protection code that describes the accessibility of the page for each processor mode. The code 
allows a choice of read or write protection for each processor mode. 


e Each mode’s access can be read/write, read-only, or no-access. 
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e Read and write accessibility are specified independently. 
¢ The protection of each mode can be specified independently. 


The protection code is specified by 8 bits in the PTE (see Table 3-2). 


The OpenVMS Alpha architecture allows a page to be designated as execute only by setting 
the read enable bit for the access mode and by setting the fault on read and write bits in the 
PTE. | 


3.7.3 Access Violation Fault 


An Access Violation fault occurs if an illegal access is attempted, as determined by the current 
processor mode and the page’s protection field. 


3.8 Address Translation 


The page tables can be accessed from physical memory, or (to reduce overhead) through a 
mapping to a linear region of the virtual address space. All implementations must support the 
‘virtual access method and are expected to use it as the primary access method to enhance 
performance. 


The following sections describe both access methods. 


3.8.1 Physical Access for Page Table Entries 


Physical address translation is performed by accessing entries in a multilevel page table struc- 
ture. The Page Table Base Register (PTBR) contains the physical Page Frame Number (PEN) 
of the highest-level page table. If the system was booted with three levels of page table, this is 
the Level 1 page table. If the system was booted with four levels of page table, this is the 

~ Level 0 page table. In that case, bits <Level0> of the virtual address are used to index into the 
Level 0 page table to obtain the physical PFN of the base of the Level 1 page table. 


With either a three-level or four-level page table, bits <Levell> of the virtual address are used 
to index into the Level 1 page table to obtain the physical PFN of the base of the next level 
(Level 2) page table. Bits <Level2> of the virtual address are used to index into the Level 2 
page table to obtain the physical PFN of the base of the next level (Level 3) page table. Bits 
<Level3> of the virtual address are used to index into the Level 3 page table to obtain the 
physical PEN of the page being referenced. The PFN is concatenated with virtual address bits 
<byte_within_page> to obtain the physical address of the location being accessed, 


If part of any page table resides in I/O space, or in nonexistent memory, the operation of the 
processor is UNDEFINED. 


If all the higher-level PTEs (those PTEs that map higher-significance portions of the virtual 
address space than is mapped by Levei 3) are valid, the protection bits are ignored; the protec- 
tion code in the Level 3 PTE is used to determine accessibility. If a higher-level PTE is 
invalid, an access-violation fault occurs if the PTE<KRE> equals zero. An Access- Violation 
fault on any higher-level PTE implies that all lower-level page tables mapped by that PTE do 
not exist. 
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Programming Note: 


This mapping scheme does not require multiple contiguous physical pages. There are no 
length registers. With a page size of 8K bytes and three levels of page table, 3 pages (24K 
bytes) map 8M bytes of virtual address space; 1026 pages (approximately 8M bytes) map 
an 8G-byte address space; and 1,049,601 pages (approximately 8G bytes) map the entire 
8T byte 2**43 byte address space. 


The algorithm to generate a physical address from a virtual address follows: 


IF {SEXT(VA<63:VA_SIZE>) NEQ SEXT(VA<VA_SIZE-1>} THEN 
{initiate Access Violation fault} 


IF {booted with 4 levels of page table} THEN 


! Read Physical 
level0 pte < ({PTBR * page size} + {8 * VA<level0>}) 


IF level0 pte<v> EQ 0 THEN 
IF level0 pte<KRE> EQ 0 THEN 
{initiate Access Violation fault} 
ELSE 


{initiate Translation Not Valid fault} 


! Read Physical 
levell_ pte < ({level0_pte<PFN> * page size} + {8 * VA<levell>}) 


ELSE 


! Read Physical 
levell_pte <- ({PTBR * page size} + {8 * VA<levell1>}) 


IF levell pte<v> EQ 0 THEN 
IF levell pte<KRE> EQ 0 THEN 
{initiate Access Violation fault} 
ELSE 
{initiate Translation Not Valid fault} 


! Read Physical 
level2_ pte < ({levell_pte<PFN> * page_size} + {8 * VA<level2>}) 


IF level2_ pte<v> EQ 0 THEN 
IF level2 pte<KRE> EQ 0 THEN 
{initiate Access Violation fault} 
ELSE 
{initiate Translation Not Valid fault} 


! Read Physical 
level3 pte < ({level2_pte<PFN> * page size} + {8 * VA<level3>}) 
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' IF {{{level3_ pte<UWE> EQ 0} AND {write access} AND {PS<CM> EQ 3}} OR 
{{level3 pte<URE> EQ 0} AND {read access} AND {PS<CM> EQ 3}} OR 
{{level3_pte<SWE> EQ 0} AND {write access} AND {PS<CM> EQ 2}} OR 
{{level3_ pte<SRE> EQ 0} AND {read access} AND {PS<CM> EQ 2}} OR 
{{level3 pte<EWE> EQ 0} AND {write access} AND {PS<CM> EQ 1}} OR 
{{level3 pte<ERE> EQ 0} AND {read access} AND {PS<CM> EQ 1}} OR 
{{level3_pte<KWE> EQ 0} AND {write access} AND {PS<CM> EQ 0}} OR 
{{level3 pte<KRE> EQ 0} AND {read access} AND {PS<CM> EQ 0}}} 
THEN 

{initiate Access Violation fault} 
ELSE 

IF level3 pte<Vv> EQ 0 THEN 

{initiate Translation Not Valid fault} 


IF {level3 pte<FOW> EQ 1} AND { write access} THEN 
{initiate Fault On Write fault} 

IF {level3 pte<FOR> EQ 1} AND { read access} THEN 
{initiate Fault On Read fault} 

IF {level3 pte<FOE> EQ 1} AND { execute access} THEN 
{initiate Fault On Execute fault} 


Physical Address <- {level3_pte<PFN> * page size} OR VA<byte within_page> 


3.8.2 Virtual Access for Page Table Entries 


To reduce the overhead associated with the address translation in a multilevel page table struc- 
ture, the page tables are mapped into a linear region of the virtual address space. The virtual 
address of the base of the page table structure is set on a system-wide basis and is contained in 
the VPTB IPR. 


When a native mode DTB or ITB miss occurs, the TBMISS flows attempt to load the Level 3 
page table entry using a single virtual mode load instruction. 


The algorithm involving the manipulation of the missing VA follows, where L_c represents 
Level_count and pS represents pageSize: 


! If booted with 3 level fields in the VA format, L c=3 
! If booted with 4 level fields in the VA format, L c=4 


tmp <-LEFT SHIFT (va, {64 - {{lg(pS)*{L_ct1l}} - {L_c*3}}}) 

tmp <-RIGHT SHIFT (tmp, {64 - {{lg(pS)*{L_ct+l}} - {L_c*3}} + lg(pS)-3}) 
tmp <— VPTB OR tmp 

tmp<2:0> < 0 


At this point, tmp contains the VA of the Level 3 page table entry. A LDQ from that VA will 
result in the acquistion of the PTE needed to satisfy the initial TBMISS condition. 


However, in the PALcode environment, if a TBMISS occurs during an attempt to fetch the 
Level 3 PTE, it is necessary to use the longer sequence of multiple dependent loads described 
in Section 3.8.1. 
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Chapter 5 contains the description of the VPTB IPR used to contain the virtual address of the 
base of the page table structure. 


The necessary mapping of the page tables for the correct function of the algorithm is done as 
follows. In the algorithm, if the system is booted with three level fields in the virtual address 
format, Level_count=3. If the system is booted with four level fields in the virtual address for- 
mat, Level_count=4. 


1. Select a 2 evel_count*lg(pageSize/8))+3) byte-aligned region (an address with 
Level_count*lg(pageSize/8)+3) low-order zeros) in the virtual address space. This 
value will be written into the VPTB register. 


2. Create a PTE to map the page tables as follows: 


PTE = 0 ! Initialize all fields to zero 
PTE<63:32> = pfn_ of _most significant pagetable 

! Set the PFN to the PFN of 

! the most significant pagetable 
! 

t 


PTE<8> 1 
PTE<O> = 1 


Set the kernel read enable bit 
Set the valid bit 


3. Write the created PTE into the page table entry that corresponds to the VPTB value. If 
operating in the mode of three levels of page table, this is the Level 1 page table. If 
operating in the mode of four levels of page table, this is the Level 0 page table. 


4. Set all higher-level, Valid PTEs that map the Level 3 page tables to allow kernel read 
access. 


5. Write the VPTB register with the selected base value. — 


Note: 


No validity checks need be made on the value stored in the VPTB in a running system. 
Therefore, if the VPTB contains an invalid address, the operation is UNDEFINED. 


3.9 Translation Buffer 


In order to save actual memory references when repeatedly referencing the same pages, hard- 
ware implementations include a translation buffer to remember successful virtual address 
translations and page states. 


When the process context is changed, a new value is loaded into the Address Space Number 
(ASN) internal processor register with a Swap Privileged Context instruction (CALL_PAL 
SWPCTX). (see Section 2.6 and Chapter 4.) This causes address translations for pages with 
PTE<ASM5> clear to be invalidated on a processor that does not implement address space num- 
bers. Additionally, when the software changes any part (except for the Software field) of a 
valid Page Table Entry, it must also move a virtual address within the corresponding page to 
the Translation Buffer Invalidate Single (TBIS) internal processor register with the MTPR 
- instruction (see Chapter 5). 
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Implementation Note: 


Some implementations may invalidate the entire Translation Buffer on an MTPR to TBIS. 
In general, implementations may invalidate more than the required translations in the TB. 


The entire Translation Buffer can be invalidated by doing a write to Translation Buffer Invali- 
date All register (CALL_PAL MTPR_TBIA), and all ASM=0 entries can be invalidated by 
doing a write to Translation Buffer Invalidate All Process register (CALL_PAL 
MTPR_TBIAP). (See Chapter 5.) 


The Translation Buffer must not store invalid PTEs. Therefore, the software is not required to 
invalidate Translation Buffer entries when making changes for PTEs that are already invalid. 


After software changes a valid zero-,first- or second-level PTE, software must flush the trans- 
lation for the corresponding page in the virtual page table. Then software must flush the 
translations of all valid pages mapped by that page. In the case of a change to a first-level 
PTE, this action must be taken through a second iteration. In the case of a change to a 
zero-level PTE, this action must be taken through a second and third iteration. 


The TBCHK internal processor register is available for interrogating the presence of a valid 
translation in the Translation Buffer (see Chapter 5). 


Implementation Note: 


Hardware implementors should be aware that a single, direct-mapped TB has a potential 
problem when a load/store instruction and its data map to the same TB location. If TB 
misses are handled in PALcode, there could be an endless loop unless the instruction is 
held in an instruction buffer or a translated physical PC is maintained by the hardware. 


3.10 Address Space Numbers 


The Alpha architecture allows a processor to optionally implement address space numbers 
(process tags) to reduce the need for invalidation of cached address translations for process 
specific addresses when a context switch occurs. The supported ASN range is 0...MAX_ASN. 
MAX_ASN is provided in the HWRPB MAX_ASN field. See Console Interface (III), Chapter 
2, for a detailed description of the HWRPB. 


Note: 


If an ASN outside of the range 0...MAX_ASN is assigned to a process, the operation of 
the processor is UNDEFINED. 


The address space number for the current process is loaded by software in the Address Space 
Number (ASN) internal processor register with a Swap Privileged Context instruction. ASNs 
are processor specific and the hardware makes no attempt to maintain coherency across multi- 
ple processors. In a multiprocessor system, software is responsible for ensuring the 
consistency of TB entries for processes that might be rescheduled on different processors. 


Systems that support ASNs should have MAX_ASN in the range 13...65535. The number of 
ASNs should be determined by the market a system is targeting. 
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Programming Note: 


System software should not assume that the number of ASNs is a power of two. This 
allows, for example, hardware to use N TB tag bits to encode (2**N)—3 ASN values, one 
value for ASM=1 PTEs, and one for invalid. 


There are several possible ways of using ASNs that result from several complications in a 
multiprocessor system. Consider the case in which a process that executed on processor 1 
is rescheduled on processor 2. If a page is deleted or its protection is changed, the TB in 
processor 1 has stale data. One solution is to send an interprocessor interrupt to all the 
processors on which this process could have run and cause them to invalidate the changed 
PTE. That results in significant overhead in a system with several processors. Another 
solution is to have software invalidate all TB entries for a process on a new processor 
before it can begin execution, if the process executed on another processor during its 
previous execution. That ensures the deletion of possibly stale TB entries on the new 
processor. A third solution is to assign a new ASN whenever a process is run on a 
processor that is not the same as the last processor on which it ran. 


3.11 Memory Management Faults 


Five types of faults are associated with memory access and protection: 
e Access Control Violation (ACV) 


Taken when the protection field of the third-level PTE that maps the data indicates 
that the intended page reference would be illegal in the specified access mode. An 
Access Control Violation fault is also taken if the KRE bit is zero in an invalid Level 
0 af one exists), Level 1, or Level 2 PTE. 


e Fault on Read (FOR) 
Occurs when a read is attempted with PTE<FOR> set. 
e §=6Fault on Write (FOW) 
Occurs when a write is attempted with PTE<FOW> set. 
e =©Fault on Execute (FOE) 
Occurs when instruction execution is attempted with PTE<FOES> set. 
e Translation Not Valid (TNV) 
Taken when a read or write reference is attempted through an invalid PTE in a Level 0 
(if one exists), Level 1, Level 2, or Level 3 page table. 


See Chapter 6 for a detailed description of these faults. 


Those five faults have distinct vectors in the System Control Block. The Access Violation 
(ACV) fault takes precedence over the faults TNV, FOR, FOW, and FOE. The Translation 
Not Valid (TNV) fault takes precedence over the faults FOR, FOW, and FOE. 


The faults FOR and FOW can occur simultaneously in the CALL_PAL queue instructions, in 
which case the order that the exceptions are taken is UNPREDICTABLE (see Section 2.1). 
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3.12 \Revision History 


Revision 7.0, November 13, 1997 
1. Alpha AXP ——> Alpha 
2. Added ECO 99, fourth level of page table 
3. OpenVMS AXP —> OpenVMS Alpha 


Revision 6.0, December 12, 1994 
1. Added ECO 67 for translation buffer section 
2. Alpha —> Alpha AXP 
3. Cleaned up physical address space section (3.3) 


Revision 5.0, May 12, 1992 
1. Added spacing to code_examples 
2. Term level replaces seg in address translation sect 
3. Added ECO #17, address translation performance enhancements 
4. Converted to SDML 
5. Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 
1. Typos 
2. Clarify reference to TNV and FOx as mutually exclusive 


3. Expand on reference to simultaneous occurrence of FOR and FOW in section ‘Memory 
Management Faults’ 


Revision 3.0, March 2, 1990 
1. Change ASN to variable size 
2. Remove Huge pages and add Granularity hint 
3. Add rule on changing PTEs from valid to invalid 


Revision 2.0, October 4, 1989 
1. Remove references to buffer space 
2. Add note that PTE<6:7> are not to be used by software 
3. Change name of large pages to huge pages. 
4. Add implementation dependent use of high order PEN bits to specify cacheing policy. 


Revision 1.0, May 23, 1989 


1. First review distribution\ 
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Chapter 4 


Process Structure (II-A) 


4.1 Process Definition 


A process is the basic entity that is scheduled for execution by the processor. A process repre- 
sents a single thread of execution and consists of an address space and both hardware and 
software context. 
The hardware context of a process is defined by: 

e Thirty-one integer registers and 31 floating-point registers 

e Processor Status (PS) | 

¢ Program Counter (PC) 

e Four stack pointers 

e Asynchronous System Trap Enable and summary registers (ASTEN, ASTSR) 

e Process Page Table Base Register (PTBR) 

e Address Space Number (ASN) 

e =6Floating Enable Register (FEN) 

e Charged Process Cycles 

e Process Unique value 

e Data Alignment Trap (DAT) 

° Performance Monitoring Enable Register (PME) 


The software context of a process is defined by operating system software and is system 
dependent. 


A process may share the same address space with other processes or have an address space of 
its own. There is, however, no separate address space for system software, and therefore, the 
operating system must be mapped into the address space of each process (see Chapter 3). 


In order for a process to execute, its hardware context must be loaded into the integer regis- 
ters, floating-point registers, and internal processor registers. When a process is being 
executed, its hardware context is continuously updated. When a process is not being executed, 
its hardware context is stored in memory. 
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Saving the hardware context of the current process in memory, followed by loading the hard- 
ware context for a new process, is termed context switching. Context switching occurs as one 
process after another is scheduled by the operating system for execution. } 


4.2 Hardware Privileged Process Context 
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The hardware context of a process is defined by a privileged part that is context switched with 
the Swap Privileged Context instruction (SWPCTX) (see Section 2.6), and a nonprivileged 
part that is context switched by operating system software. 


When a process is not executing, its privileged context is stored in a 128-byte naturally 
aligned memory structure called the Hardware Privileged Context Block (HWPCB). (See Fig- 
ure 4-1.) 


Figure 4-1: Hardware Privileged Context Block 
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The Hardware Privileged Context Block (HWPCB) for the current process is specified by the 
Privileged Context Block Base register (PCBB). (See Chapter 5.) 


The Swap Privileged Context instruction (SWPCTX) saves the privileged context of the cur- 
rent process into the HWPCB specified by PCBB, loads a new value into PCBB, and then 
loads the privileged context of the new process into the appropriate hardware registers. 


The new value loaded into PCBB, as well as the contents of the Privileged Context Block, 
must satisfy certain constraints or an UNDEFINED operation results: 


e The physical address loaded into PCBB must be 128-byte aligned and describes 16 
contiguous quadwords that are in a memory-like region. (See Common Architecture (1), 
Chapter 5.) 


DIGITAL Restricted Distribution 


4-2 OpenVMS Alpha Software (II—A) 


¢ The value of PTBR must be the Page Frame Number of an existent page that is in a 
memory-like region. 


It is the responsibility of the operating system to save and load the nonprivileged part of the 
hardware context. 


The SWPCTX instruction returns ownership of the current HWPCB to operating system soft- 
ware and passes ownership of the new HWPCB from the operating system to the processor. 
Any attempt to write a HWPCB while ownership resides with the processor has UNDE- 
FINED results. If the HWPCB is read while ownership resides with the processor, it is 
UNPREDICTABLE whether the original or an updated value of a field is read. The processor 
can update an HWPCEB field at any time. The decision as to whether or not a field is updated is 
made individually for each field. 


If ASNs are not implemented, the ASN field is not read or written by PALcode. 
The FEN bit reflects the setting of the FEN IPR. 


Setting the PME bit alerts any performance hardware or software in the system to monitor the 
performance of this process. 


The DAT bit controls whether data alignment traps that are fixed up in PALcode are reported 
to the operating system. If the bit is clear, the trap is reported. If the bit is set, after the fixup, 
return is to the user. See Section 6.6. 


The Charged Process Cycles is the total number of PCC register counts that are charged to the 
process (modulo 2**32). When a process context is loaded by the SWPCTX instructions, the 
contents of the PCC count field (PCC_CNT) are subtracted from the contents of 
HWPCB[64]<31:0> and the result is written to the PCC offset field (PCC_OFF): 


PCC<63:32> < (HWPCB[64]<31:0> — PCC<31:0>) 


When a process context is saved by the SWPCTX instruction, the charged process cycles is 
computed by performing an unsigned add of PCC<63:32> and PCC<31:0>. That value is writ- 
ten to HWPCB[64]<31:0>. 


Software Programming Note: 


The following example returns in RO the current PCC register count (modulo 2**32) for a 
process. Care is taken not to cause an unwanted sign extension. 


RPCC RO ; Read the processor cycle counter 

SLL RO, #32, R1 ; Line up the offset and count fields 
ADDO RO, Rl, RO : Do add 

SRL RO, #32, RO ; Zero extend the cycle count to 64 bits 


The Process Unique value is that value used in support of multithread implementations. The 
value is stored in the HWPCB when the process is not active. When the process is active, the 
value may be cached in hardware internal storage or kept only in the HWPCB. 
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4.3 Asynchronous System Traps (AST) 


Asynchronous System Traps (ASTs) are a means of notifying a process of events that are not 
synchronized with its execution but that must be dealt with in the context of the process with 
minimum delay. 


Asynchronous System Traps (ASTs) interrupt process execution and are controlled by the 
AST Enable (ASTEN) and AST Summary (ASTSR) internal processor registers. (See Chap- 
ter 5.) 


The AST Enable register (ASTEN) contains an enable bit for each of the four processor access 
modes. When the bit corresponding to an access mode is set, ASTs for that mode are enabled. 
The AST enable bit for an access mode may be changed by executing a Swap AST Enable 
instruction (SWASTEN; see Section 2.6), or by executing a Move to Processor Register 
instruction specifying ASTEN (MTPR ASTEN; see Chapter 5). 


The AST Summary Register (ASTSR) contains a pending bit for each of the four processor 
access modes. When the bit corresponding to an access mode is set, an AST is pending for that 
mode. 


Kernel mode software may request an AST for a particular access mode by executing a Move 
to Processor Register instruction specifying ASTSR (MTPR ASTSR; see Chapter 5). 


Hardware or PALcode monitors the state of ASTEN, ASTSR, PS<CM>, and PS<IPL>. If 
PS<IPL> is less than 2, and there is an AST pending and enabled for an access mode that is 
less than or equal to PS<CM> (that is, an equal or more privileged access mode), an AST is 
initiated at IPL 2. . 


ASTs that are pending and enabled for a less privileged access mode are not allowed to inter- 
rupt execution in a more privileged access mode. 


4.4 Process Context Switching 


Process context switching occurs as one process after another is scheduled for execution by 
operating system software. Context switching requires the hardware context of one process to 
be saved in memory followed by the loading of the hardware context for another process into 
the hardware registers. 


The privileged hardware context is swapped with the CALL_PAL Swap Privileged Context 
instruction (SWPCTX). Other hardware context must be saved and restored by operating sys- 
tem software. 


The sequence in which process context is changed is important because the SWPCTX instruc- 
tion changes the environment in which the context switching software itself is executing. Also, 
although hardware does not enforce this, it is advisable to execute the actual context switching 
software in an environment that cannot be context switched (that is, at an IPL high enough 
that rescheduling cannot occur). 
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The SWPCTX< instruction is the only method provided for loading certain internal processor 
registers. The SWPCTX instruction always saves the privileged context of the old process and 
loads the privileged context of anew process. Therefore, a valid HWPCB must be available 
to save the privileged context of the old process as well as load the privileged context of the 
new process. 


At system initialization, a valid HWPCB is constructed in the Hardware Restart Parameter 
Block (HWRPB) for the primary processor. (See Console Interface (III), Chapter 2.) Thereaf- 
ter, it is the responsibility of operating system software to ensure a valid HWPCB when 
executing a SWPCTX instruction. 
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4.5 \Revision History 


Revision 7.0, November 13, 1997 


1. 
2. 


Alpha AXP —> Alpha 
OpenVMS AXP —> OpenVMS Alpha 


Revision 6.0, December 12, 1994 


Ly; 
2: 


Alpha ——> Alpha AXP 
Added charged process cycles to HWPCB 


Revision 5.0, May 12, 1992 


I. 


a a I RL eR 


Corrected PME description, added process unique value description 
Added PME, DAT and process unique value to Process definition 
Added PME bit as per ECO #43 

Corrected DAT bit description as per ECO #40 

Added DAT bit and FEN bit description 

Converted to SDML 

Added ECO #18, #21 

Changed ‘CC’ to ‘PCC’ in HWPCB 


Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


L; 


Remove references to ASTs as ‘interrupts’, substituting ‘exception’ where appropriate 


Revision 3.0, March 2, 1990 


1 
2 
3. 
4 


Lower number of PAL scratch words from 23 to 7 

Make ASN field be ignored on systems that do not implement ASNs 
Change ASTRR to ASTSR 

Change alignment of HWPCB 


Revision 2.0, October 4, 1989 


A 


Add FEN, CC, and PAL scratch areas to HWPCB 


Revision 1.0, May 23, 1989 


1. 


First review distribution\ 
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Chapter 5 


Internal Processor Registers (IIA) 


5.1 Internal Processor Registers 


This chapter describes the OpenVMS Alpha Internal Processor Registers (IPRs). These regis- 
ters are read and written with Move from Processor Register (MFPR) and Move to Processor 
Register (MTPR) instructions. See Section 2.6. 


Those instructions accept an input operand in R16 and return a result, if any, in RO. Registers 
R1, R16, and R17 are UNPREDICTABLE after a CALL_PAL MxPR routine. If a 
CALL_PAL MxPR routine does not return a result in RO, then RO is also UNPREDICTABLE 
on return. 


Some IPRs (for example, ASTSR, ASTEN, IPL) may be both read and written in a combined 
operation by performing an MTPR instruction. 


Internal Processor Registers may or may not be implemented as actual hardware registers. An 
implementation may choose any combination of PALcode and hardware to produce the archi- 
tecturally specified function. Internal Processor Registers are only accessible from kernel 
mode. 


5.2 Stack Pointer Internal Processor Registers 


The stack pointers for user, supervisor, and executive stacks are accessible as IPRs through the 

CALL_PAL MTPR and MFPR instructions. An implementation may retain some or all of 

these stack pointers only in the HWPCB. In this case, MTPR and MFPR for these registers 

must access the corresponding PCB locations. However, implementations that have these 

stack pointers in internal hardware registers are not required to access the corresponding» 
HWPCB locations for MTPR and MFPR. The HWPCB locations get updated when a 

SWPCTX instruction is executed. 


An implementation may also choose to keep the kernel stack pointer (KSP) in an internal hard- 
ware register (labeled IPR_KSP); however, this register is not directly accessible through 
MTPR and MFPR instructions. Because access to the KSP requires kernel mode, the actual 
KSP is the current mode stack pointer (R30); thus access to KSP is provided through R30, and 
no MTPR or MFPR access is required. PALcode routines can directly access IPR_KSP as 
needed. 


At system initialization, the value of the KSP is taken from the initial HWPCB (see Chapter 
4). Table 5—1 summarizes the IPRs. 
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5.3 IPR Summary 


Table 5-1: Internal Processor Register (IPR) Summary 


Register Name Mnemonic _ Access! ee eo SH d 
Address Space Number ASN R — Number Yes 
AST Enable ASTEN R/W* Mask Mask Yes 
AST Summary Register ASTSR R/W* Mask Mask Yes 
Data Align Trap Fixup DATFX W Value — Yes 
Exec Stack Pointer ESP R/W Address Address Yes 
Floating-point Enable FEN R/W Value Value Yes 
Interprocessor Int. Request IPIR W Number = No 
Interrupt Priority Level IPL R/W* Value Value No 
Kernel Stack Pointer _KSP None — — Yes 
Machine Check Error Summary MCES R/W Value Value No 
Performance Monitor PERFMON W* IMP IMP No 
Privileged Context Block Base PCBB R — Address No 
Processor Base Register PRBR R/W Value Value No 
Page Table Base Register PTBR R _ Frame Yes 
System Control Block Base SCBB R/W Frame Frame No 
Software Int. Request Register SIRR W Level — No 
Software Int. Summary Register SISR R — Mask No 
Supervisor Stack Pointer SSP R/W Address Address Yes 
TB Check TBCHK R Number Status No 
TB Invalid. All TBIA W — — No 
TB Invalid. All Process TBIAP W — — No 
TB Invalid. Single TBIS WwW Address — No 
TB Invalid. Single Data TBISD Ww Address  — No 
TB Invalid. Single Instruct. TBISI W Address — No 
User Stack Pointer USP R/W Address Address Yes 
Virtual Page Table Base VPTB R/W Address Address No 
Who-Am-I WHAMI R — Number No 


! Access symbols are defined in Table 5-2. 
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Table 5-2: Internal Processor Register (IPR) Access Summary 


Access Type Meaning 


R Access by MFPR only. 

WwW Access by MTPR only. 

R/W Access by MFPR or MTPR. 

w* Read and Write access accomplished by MTPR. See Section 5.1 for 
details. 

R/W* Access by MFPR or MTPR. Read and Write access accomplished by 
MTPR. See Section 5.1 for details. 

None Not accessible by MTPR or MFPR; accessed by PALcode routines as 
needed. 
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5.3.1 Address Space Number (ASN) 


Access: 
Read 


Operation: 


IF {ASN are implemented} THEN 
RO < ZEXT(ASN) 

ELSE 
RO < 0 


Value at System Initialization: 


Zero 


Format: 


Figure 5-1: Address Space Number (ASN) Register 


63 0 
. Address Space Number 
RO 
Description: 


Address Space Numbers (ASNs) are used to further qualify Translation Buffer references. See 
Chapter 3. If ASNs are implemented, the current ASN may be read by executing an MFPR 
instruction specifying ASN. 


As processes are scheduled for execution, the ASN for the next process to execute is loaded 
using the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 and Chapter 4. 


The ASN register is an implicit operand to the CALL_PAL MFPR_IPR, TBCHK, and TBISx 
PALcode instructions, in which it is used to qualify the virtual address supplied in R16. 
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5.3.2 AST Enable (ASTEN) 


Access: 


Read 
Write* 


Operation: 


RO < ZEXT (ASTEN<3:0>) ! Read (MFPR) 

RO < ZEXT (ASTEN<3:0>) ! Write* (MTPR) 
ASTEN<3:0> < {{ASTEN<3:0> AND R16<3:0>} OR R16<7:4>} 
{check for pending ASTs} 


Value at System Initialization: 


Zero 
Format: 


Figure 5—2: AST Enable (ASTEN) Register 


63 43210 
U|S|E|K 
RAZ E\E\E\E 
N|N|NIN 
Description: 


The AST Enable Register records the AST enable state for each of the modes: kernel (KEN), 
executive (EEN), supervisor (SEN), and user (UEN). By writing R16 appropriately and then 
executing an MTPR instruction specifying ASTEN, the value of ASTEN may be simulta- 
neously read and modified. R16 contains bit masks that are used to determine the new value of 
ASTEN: 


e §©6Bits R16<0> and R16<4> control the new state of kernel enable. 

e Bits R16<1> and R16<5> control the new state of executive enable. 

e §=6Bits R16<2> and R16<6> control the new state of supervisor enable. 

e §6Bits R16<3> and R16<7> control the new state of user enable. 
An MFPR to ASTEN reads the current value of the ASTEN and returns this value in RO. 
An MTPR to ASTEN begins by reading the current value of ASTEN and returning this value 
in RO. The current value of ASTEN is then ANDed with bits R16<3:0>; these bits preserve (if 


set to 1) or clear Gf equal to 0) the current state of their corresponding enable modes. The 
value produced by this operation is then ORed with bits R16<7:4>; these bits turn on (if set to 
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1) or do not affect (if equal to 0) their corresponding enable modes. The resulting value is then 
written to the ASTEN. 


Note: 


All AST enables can be cleared by loading a zero into R16 and executing an MTPR 
instruction specifying ASTEN. To enable an AST for a given mode, load R16 with a mask 
that has bits <3:0> set and one of the bits <7:4> corresponding to the AST mode to be set. 
Then execute an MTPR instruction specifying ASTEN. 


\ASTEN is not present in the VAX architecture. It was added to the Alpha architecture to 
allow software (especially nonprivileged software) to enable and disable ASTs efficiently for 
the current mode via the SWASTEN instruction. It is anticipated that, with multitasking, it 
will become extremely important to be able to enable and disable ASTs efficiently in share- 
able runtime support routines.\ 


As processes are scheduled for execution, the state of the AST enables for the next process to 
execute is loaded using the Swap Privileged Context (SWPCTX) instruction. The Swap AST 
Enable (SWASTEN) instruction can be used to change the enable state for the current access 
mode. See Section 2.1.13 and Chapter 4. 
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5.3.3 AST Summary Register (ASTSR) 


Access: 


Read 
Write* 


Operation: 


RO < ZEXT(ASTSR<3:0>) ! Read (MFPR) 

RO < ZEXT(ASTSR<3:0>) ! Write* (MTPR) 
ASTSR<3:0> < {{ASTSR<3:0> AND R16<3:0>} OR R16<7:4>} 
{check for pending ASTs} 


Value at System Initialization: 


Zero 
Format: 


Figure 5-3: AST Summary Register (ASTSR) 


63 876543210 
| U|S|E|K|U|S|E|K 
IGN O|OJO/O|c|cicic 
NINININIL LL IL 

R16 
63 4321 0 
| UIS|E|K 
| RAZ P|P|P|P 
DID|D/D 

RO 

Description: 


The AST Summary Register records the AST pending state for each of the modes: kernel 
(KPD), executive (EPD), supervisor (SPD), and user (UPD). 


By writing R16 appropriately and then executing an MTPR instruction specifying ASTSR, the 
value of ASTSR may be simultaneously read and modified. R16 contains bit masks used to 
determine the new value of ASTSR: : 


e §6Bits R16<0> and R16<4> control the new state of kernel pending. 

e §=6Bits R16<1> and R16<5> control the new state of executive pending. 
e Bits R16<2> and R16<6> control the new state of supervisor pending. 
e §©6Bits R16<3> and R16<7> control the new state of user pending. 


An MFPR reads the current value of ASTSR and returns this value in RO. 
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An MTPR to ASTSR begins by reading the current value of ASTSR and returning this value 
in RO. The current value of ASTSR is then ANDed with bits R16<3:0>; these bits preserve (if 
set to 1) or clear (if equal to 0) the current state of their corresponding pending modes. The 
value produced by this operation is then ORed with bits R16<7:4>; these bits turn on (if set to 
1) or do not affect (if equal to 0) their corresponding pending modes. The resulting value is 
then written to the ASTSR. 


Note: | 


All AST requests can be cleared by loading a zero in R16 and executing an MTPR 
instruction specifying ASTSR. To request an AST for a given mode, load R16 with a 
mask that has bits <3:0> set and one of the bits <7:4> corresponding to the AST mode to 
be set. Then execute an MTPR instruction specifying ASTSR. 


As processes are scheduled for execution, the pending AST state for the next process to exe- 
cute is loaded using the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 
and Chapter 4. 


When the processor IPL is less than 2, and proper enabling conditions are present, an AST 
interrupt is initiated at IPL 2 and the corresponding access mode bit in ASTSR is cleared. See 
Section 6.7.6. 
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5.3.4 Data Alignment Trap Fixup (DATFX) 


Access: 


Write 


Operation: 


DATFX <— R16<0> 
(HWPCB+56 )<63> <—- DATFX 


Value at System Initialization: 


Zero 
Format: 


Figure 5—4: Data Alignment Trap Fixup (DATFX) 


63 1 0 
D 
A 
T 
Description: 


Data Alignment traps are fixed up in PALcode and are reported to the operating system under 
the control of the DAT bit. If the bit is zero, the trap is reported. For the LDx_L and STx_C 
instructions, no fixup is possible and an illegal operand exception is generated. 


For the description of the data alignment traps, see Section 6.6. 
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5.3.5 Executive Stack Pointer (ESP) 


Access: 


Read/Write 


Operation: 


IF {internal registers for stack pointers} THEN 
RO < ESP 

ELSE 
RO < (IPR_PCBB + HWPCB ESP) 


IF {internal registers for stack pointers} THEN 
ESP <- R16 


ELSE 
(IPR_PCBB + HWPCB ESP) <- R16 


Value at System Initialization: 


Value in the initial HWPCB 
Format: 


Figure 5-5: Executive Stack Pointer (ESP) 


63 


Stack Address 


Description: 


This register allows the stack pointer for executive mode (ESP) to be read and written via 


MFPR and MTPR instructions that specify ESP. 


The current stack pointer may be read and written directly by specifying scalar register SP 


(R30). 


As processes are scheduled for execution, the stack pointers for the next process to execute are 
loaded using the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 and Chap- 


ter 4. 
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! Read 


! Write 


5.3.6 Floating Enable (FEN) 


Access: 
Read/Write 
Operation: 
RO < ZEXT(FEN) ! Read 
FEN < R16<0> ! Write 
(HWPCB+56)<0> <— FEN ! Update PCB on Write 


Value at System Initialization: 


Zero 
Format: 


Figure 5-6: Floating Enable (FEN) Register 


Description: 


The floating-point unit can be disabled with the CALL_PAL CLRFEN instruction. If the Float- 
ing Enable Register (FEN) is zero, all instructions that have floating registers as operands 
cause a floating-point disabled fault. See Section 6.3.1.1. 
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5.3.7 Interprocessor Interrupt Request (IPIR) 


5-12 


Access: 


Write 


Operation: 
IPIR < R16 
Value at System Initialization: 


Not applicable 


Format: 


Figure 5—7: Interprocessor Interrupt Request (IPIR) Register 


0 


63 
Processor Number 


R16 


Description: 


An interprocessor interrupt can be requested on a specified processor by writing that proces- 
sor’s number into the IPIR register through an MTPR instruction. The interrupt request is 
recorded on the target processor and is initiated when proper enabling conditions are present. 


Programming Note: 


The interrupt need not be initiated before the next instruction is executed on the requesting 
processor, even if the requesting processor is also the target processor for the request. 


For additional information on interprocessor interrupts, see Section 6.4.6. 
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5.3.8 Interrupt Priority Level (IPL) 


Access: 
Read/Write* 
Operation: 
RO < ZEXT(PS<IPL>) ! Read | 
RO ¢ ZEXT(PS<IPL>) ! Write* 
PS<IPL> < R16<4:0> ! Write 


{check for pending ASTs or interrupts} 


Value at System Initialization: 


a1 
Format: 


Figure 5-8: Interrupt Priority Level (IPL) 


63 


SBZ IPL 


Description: 


An MFPR IPL returns the current interrupt priority level in RO. An MTPR IPL returns the cur- 
rent interrupt priority level in RO and sets the interrupt priority level to the value in R16. If 
proper enabling conditions are present, an interrupt or AST is initiated prior to issuing the next 
instruction. See Sections 6.4.2 and 6.7.6. R16<63:5> are defined as RAZ/SBZ. Therefore, the 
presence of nonzero bits upon write in R16<63:5> may cause UNDEFINED results. 
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5.3.9 Machine Check Error Summary Register (MCES) 


Access: 


Read/Write 

Operation: 
RO < ZEXT(MCES) ! Read 
IF {R16<0> EQ 1} THEN MCES<0> < 0 ! Write 
IF {R16<1> EQ 1} THEN MCES<1> < 0 
IF {R16<2> EQ 1} THEN MCES<2> < 0 


MCES<3> <- R16<3> 
MCES<4> <— R16<4> 


Value at System Initialization: 


Zero 


Format: 


Figure 5-9: Machine Check Error Summary (MCES) Register 


63 32 31 543210 
DID|P|S|M 

IMP Reserved S|PICICIC 

CICIEJEIK 


Description: 
The use of the MCES IPR is described in Section 6.5. 


MCK (MCES<0O>) is set by the hardware or PALcode when a processor or system machine 
check occurs. SCE (MCES<1>) is set by the hardware or PALcode when a system correctable 
error occurs. PCE (MCES<2>) is set by the hardware or PALcode when a processor correct- 
able error occurs. . 


Setting the corresponding bit(s) in R16 clears MCK, SCE, and PCE. MCK is cleared by the 
operating system machine check error handler and used by the hardware or PALcode to detect 
double machine checks. SCE and PCE are cleared by the operating system or processor sys- 
tem correctable error handlers; these bits are used to indicate that the associated correctable 
error logout area may be reused by hardware or PALcode. In the event of double correctable 
errors, PALcode does not overwrite the logout area and does not force the processor to enter 
console I/O mode. See Section 6.5.1. 

DPC (MCES<3>) and DSC (MCES<4>) are used to disable reporting of correctable errors to 
system software. The generation and correction of the machine check are not affected; only 
the report to system software is disabled. Setting DPC disables reporting of processor-correct- 
able machine checks. Setting DSC disables reporting of system-correctable machine checks. 
Implementation-dependent (IMP) bits may be used to report implementation-specific errors. 
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5.3.10 Performance Monitoring Register (PERFMON) 


Access: 


Write* 


Operation: 


R16 contains implementation specific input values 
R17 contains implementation specific imput values 
RO may return implementation specific values 


! 
! 
! 
! Operations and actions taken are implementation specific 


Value at System Initialization: 


Implementation Dependent 


Format: 


Figure 5-10: Performance Monitoring (PERFMON) Register 


63 


Description: 


The arguments and actions of this performance monitoring function are platform and chip 
- dependent. The functions, when defined for an implementation, are described in Appendix E. 


R16 and R17 contain implementation-dependent input values. Implementation-specific values 
may be returned in RO. . 
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5.3.11 Privileged Context Block Base (PCBB) 


Access: 


Read 
Operation: 
RO < ZEXT(PCBB) 


Value at System Initialization: 


Address of processor’s bootstrap HWPCB 


Format: 


Figure 5-11: Privileged Context Block Base (PCBB) Register 


0 


63 48 47 . 
RAZ Physical Address 


RO 


Description: 
The Privileged Context Block Base Register contains the physical address of the privileged 
context block for the current process. It may be read by executing an MFPR instruction speci- 


fying PCBB. 


PCBB is written by the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 
and Chapter 4. 
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5.3.12 Processor Base Register (PRBR) 


Access: 
Read/Write 

Operation: 
RO < PRBR ! Read 
PRBR < R16 ! Write 


Value at System Initialization: 


UNPREDICTABLE 


Format: 


Figure 5-12: Processor Base Register (PRBR) 


63 0 


Operating System-Dependent Value 





Description: 


In a multiprocessor system, it is desirable for the operating system to be able to locate a proces- 
sor-specific data structure in a simple and straightforward manner. The Processor Base 
Register provides a quadword of operating system-dependent state that can be read and written 
via MFPR and MTPR instructions that specify PRBR. 
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5.3.13, Page Table Base Register (PTBR) 


Access: 


Read 


Operation: 
RO ¢ PTBR 


Value at System Initialization: 


Value in the bootstrap HWPCB 
Format: 


Figure 5—13: Page Table Base Register (PTBR) 


63 32 31 0 


RAZ Page Frame Number 


The Page Table Base Register contains the page frame number of the highest-level page table 


for the current process. It may be read by executing an MFPR instruction specifying PTBR. 
See Chapter 3. 


RO 


Description: 


As processes are scheduled for execution, the PTBR for the next process to execute is loaded 
using the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 and Chapter 4. 
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5.3.14 System Control Block Base (SCBB) 


Access: 
Read/Write 

Operation: 
RO < ZEXT(SCBB) ! Read 
SCBB < R16 ! Write 


Value at System Initialization: 


UNPREDICTABLE 
Format: 


Figure 5-14: System Control Block Base (SCBB) Register 


63 32 31 0 


IGN/RAZ Page Frame Number 


Description: 


The System Control Block Base Register holds the Page Frame Number (PEN) of the System 
Control Block, which is used to dispatch exceptions and interrupts, and may be read and writ- 
ten by executing MFPR and MTPR instructions that specify SCBB. See Section 6.6. 


When SCBB is written, the specified physical address must be the PFN of a page that is nei- 
ther in I/O space nor nonexistent memory, or UNDEFINED operation will result. 
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5.3.15 Software Interrupt Request Register (SIRR) 


5-20 


Access: 


Write 


Operation: 


IF R16<3:0> NE O THEN 
SISR<R16<3:0>> <1 


Value at System Initialization: 


Not applicable 


Format: 


Figure 5-15: Software Interrupt Request Register (SIRR) 


63 


IGN LVL 


R16 


Description: 


A software interrupt may be requested for a particular Interrupt Priority Level (IPL) by execut- 
ing an MTPR instruction specifying SIRR. Software interrupts may be requested at levels 0 
through 15 (requests at level 0 are ignored). 


An MTPR SIRR sets the bit corresponding to the specified interrupt level in the Software 
Interrupt Summary Register (SISR). 


If proper enabling conditions are present, a software interrupt is initiated prior to issuing the 
next instruction. See Sections 6.4.1 and 6.7.6. 
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5.3.16 Software Interrupt Summary Register (SISR) 


Access: 


Read 


Operation: 
RO < ZEXT(SISR<15:0>) 


Value at System Initialization: 


Zero 


Format: 


Figure 5-16: Software Interrupt Summary Register (SISR) 





63 161514131211109 876543210 
| 





Hyd yd ty HTT LT Ey TTT EyR 
RAZ R/RIRIR/RIR R/RIRIRIRIRIRIA 
FIE|DJC/BJA 7|6}5}4]3) 2) 1|Z 


- Description: 


The Software Interrupt Summary Register records the interrupt pending state for each of the 
interrupt levels 1 through 15. The current interrupt pending state may be read by executing an 
MFPR instruction specifying SISR. 


MTPR SIRR (see SIRR) requests an interrupt at a particular interrupt level and sets the corre- 
sponding pending bit in SISR. 


When the processor IPL falls below the level of a pending request, an interrupt is initiated and 
the corresponding bit in SISR is cleared. See Sections 6.4.1 and 6.7.6. 


DIGITAL Restricted Distribution 


Internal Processor Registers (II-A) 5-21 


5.3.17 Supervisor Stack Pointer (SSP) 


Access: 


Read/Write 


Operation: 


IF {internal registers for stack pointers} THEN ! Read 
RO < SSP 

ELSE 
RO < (IPR _PCBB + HWPCB SSP) 


IF {internal registers for stack pointers} THEN ! Write 
SSP < R16 

ELSE 
(IPR_PCBB + HWPCB SSP) < R16 


Value at System Initialization: 


Value in the initial HWPCB 


Format: 


Figure 5-17: Supervisor Stack Pointer (SSP) 


63 0 


Stack Address 


Description: 


The Supervisor Stack Pointer register allows the stack pointer for supervisor mode (SSP) to be 
read and written by using MFPR and MTPR instructions that specify SSP. 


The current stack pointer may be read and written directly by specifying scalar register SP 
(R30). 


As processes are scheduled for execution, the stack pointers for the next process to execute are 
loaded using the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 and Chap- 
ter 4. 
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5.3.18 Translation Buffer Check (TBCHK) 


Access: 


Read 


Operation: 


RO < 0 
IF {implemented} THEN 

RO<O> < {indicator that VA in R16 is in TB} 
ELSE 

RO<63> < 1 


Value at System Initialization: 


Correct results are always returned 


Format: 


Figure 5-18: Translation Buffer Check Register (TBCHK) 


63 0 
Virtual Address 
R16 
63 62 1 #0 
P 
RAZ : R 
S 

RO 
Description: 


The Translation Buffer Check Register provides the capability to determine if a virtual address 
is present in the Translation Buffer by executing an MFPR instruction specifying TBCHK. 
See Chapter 3. 


The virtual address to be checked is specified in R16 and may be any address within the 
desired page. If ASNs are implemented, only those Translation Buffer entries that are associ- 
ated with the current value of the ASN IPR will be checked for the virtual address. The value 
read contains an indication of whether the function is implemented and whether the virtual 
address is present in the Translation Buffer. 


If the function is not implemented, a one is returned in bit <63> and bit <0> is clear. Other- 
wise, bit <63> is clear and bit <0> indicates the presence or absence of the virtual address in 
the Translation Buffer. Bit <0> set indicates the virtual address is present; bit <0> clear indi- 
cates it is absent. 


The TBCHK register can be used by system software for working set management. 
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5.3.19 Translation Buffer Invalidate All (TBIA) 


Access: 


Write 


Operation: 
{Invalidate all TB entries} 


Value at System Initialization: 


Not applicable 


Format: 


Figure 5-19: Translation Buffer Invalidate All (TBIA) Register 


63 0 
R16 
Description: 


The Translation Buffer Invalidate All Register provides the capability to invalidate all entries 
in the Translation Buffer by executing an MTPR instruction specifying TBIA. See Chapter 3. 
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5.3.20 Translation Buffer Invalidate All Process (TBIAP) 


Access: 


Write 


Operation: 
{Invalidate all TB entries with PTE<ASM> clear} 


Value at System Initialization: 


Not applicable 


Format: 


Figure 5-20: Translation Buffer Invalidate All Process (TBIAP) Register 


63 0 
R16 
Description: 


The Translation Buffer Invalidate All Process Register provides the capability to invalidate all 
entries in the Translation Buffer that do not have the ASM bit set by executing an MTPR 
instruction specifying TBIAP. See Chapter 3. 


Notes: 


More entries may be invalidated by this operation. For example, some implementations 
may flush the entire TB on a TBIAP. 
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5.3.21 Translation Buffer Invalidate Single (TBISx) 


Access: 


Write 


Operation: 


TBIS: 
{Invalidate single Data TB entry using R16} 
{Invalidate single Instruction TB entry using R16} 
TBISD: 


{Invalidate single Data TB entry using R16} 
TBISI: 


{Invalidate single Instruction TB entry using R16} 


Value at System Initialization: 


Not applicable 
Format: 


Figure 5-21: Translation Buffer Invalidate Single (TBIS) 


63 0 
R16 
Description: 


The Translation Buffer Invalidate Single Registers provide the capability to invalidate a single 
entry in the Instruction Translation Buffer (TBISD, the Data Translation Buffer (TBISD), or 
both translation buffers (TBIS). The virtual address to be invalidated is passed in R16 and may 
be any address within the desired page. 


Notes: 


¢ More than the single entry may be invalidated by this operation. For example some 
implementations may flush the entire TB on a TBIS. As a result, if the specified address 
does not match any entry in the Translation Buffer, then it is implementation dependent 
whether the state of the Translation Buffer is affected by the operation. 
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5.3.22 User Stack Pointer (USP) 


Access: 


Read/Write 


Operation: 


IF {internal registers for stack pointers} THEN ! Read 
RO < USP 

ELSE 
RO < (IPR_PCBB + HWPCB USP) 


IF {internal registers for stack pointers} THEN ! Write 
USP <- R16 


ELSE 
(IPR_PCBB + HWPCB USP) < R16 


Value at System Initialization: 


Value in the initial HWPCB 
Format: 


Figure 5-22: User Stack Pointer (USP) 


63 0 
Stack Address 
Description: 


This register allows the stack pointer for user mode (USP) to be read and written via MFPR 
and MTPR instructions that specify USP. 


The current stack pointer may be read and written directly by specifying scalar register SP 
(R30). 


AS processes are scheduled for execution, the stack pointers for the next process to execute are 
loaded using the Swap Privileged Context (SWPCTX) instruction. See Section 2.6.7 and Chap- 
ter 4. 
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5.3.23 Virtual Page Table Base (VPTB) 


Access: 
Read/Write 
Operation: | 
RO < VPTB ! Read 


VPTB < R16 ! Write 


Value at System Initialization: 


Initialized by the console in the bootstrap address space. 
Format: 


Figure 5-23: Virtual Page Table Base (VPTB) Register 


63 : 0 
RO 
Description: 


The Virtual Page Table Base Register contains the virtual address of the base of the entire 
multi-level page table structure. It may be read by executing an MFPR instruction specifying 
VPTB. It is written at system initialization using an MTPR instruction specifying VPTB. See 
Section 3.8.2 and Console Interface (III), Chapter 3, for initialization considerations. 
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5.3.24 Who-Am-I (WHAMD 


Access: 

Read 
Operation: 

RO < WHAMI 


Value at System Initialization: 


Processor number 


Format: 


Figure 5-24: Who-Am-I (WHAMD) Register 


63 te) 


Processor Number 


RO 


Description: 


The Who-Am-I Register provides the capability to read the current processor number by exe- 
cuting an MFPR instruction specifying WHAMI. The processor number returned is in the 
range 0 to the number of processors minus one that can be configured in the system. Processor 
number FFFF FFFF FFFF FFFF¢ is reserved. 


The current processor number is useful in a multiprocessing system to index arrays that store 
per processor information. Such information is operating system dependent. 
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5.4 \Revision History 


Revision 7.0, November 13, 1997 


1. 


2 
3 
4. 
=) 


Added ECO 99, Extended VA (PTBR and VPIB registers) 
Added reference to clrfen instruction, eco 101 

OpenVMS AXP ——> OpenVMS Alpha 

Alpha AXP ——> Alpha 


Converted into FrameMaker 


Revision 6.0, December 1994 


Es 
2 
3. 


Editorial changes to MCES for clarity 
Alpha ——> Alpha AXP 
Added ECO 50, correction to PERFMON IPR description 


Revision 5.0, May 12, 1992 


B 


a ee a re ae 


Added changes to MCES for ECO #45 

Added Perfmon ipr description and entry in summary table 

Added DATFX related ecos #30, #40 

Added bit field to FEN reference to PCBB as a result of datfx ecos 
Added VPTB register 

Rewrite of MCES description 

Converted to SDML 

Added ECO #16, #17, #20, #23, #24 

Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


se 


OR Oe a ee Pe Be 


MTPR IPL returns old IPL in RO 

Typos 

Change MCES IGN/RAZ field to IMP 

Describe how to clear and set mode enable bits with MTPR 

Change text for ASTSR description to indicate future action for mode set 
Change ASTEN and ASTSR to access type Read/Write 

Modify (subtly) note under IPIR to avoid confusion about timing relation 
between processors 


Clarify what value to load into IPIR to select a particular target 


10. Change ‘Value at System Initialization’ from ‘UNDEFINED’ to ‘UNPREDICTABLE’ 
11. for PRBR and SCBB 
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12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 
21, 
pHa 
23: 


24. 
22: 
26. 
24 
28. 


Note effect of writing TBIS with an address that does not match any TB 
entry 

Note that ASN is an implicit operand to a MFPR TBCHK instruction 
Emphasise distinction between SIRR and SISR 

Reworked IPR table to show which IPRs are context-switched and which 
are not. 

Remove references to ASTs as ‘interrupts’, substituting ‘exception’ where 
appropriate 

Insert spaces into long hex and binary values to improve legibility 
Clarify obscure use of MTPR to both read and write certain IPRs 
Illustrate R16 bits used to ‘gate’ ASTEN and ASTSR contents into RO 


Add pointer in the IPIR IPR section pointing to Interprocessor Interrupt material in 
Chapter 6 


Add kernel stack pointer as an internal processor register 

Modify definition of Absolute Time register and BB_WATCH entity. 
Changed IPR Summary Table and added R/W* description 

Specified all systems that support VAX or ULTRIX must have a BB_WATCH 


Clarified value written to IPIR to select a processor 


Revision 3.0, March 2, 1990 


1 
2 
3: 
4 
5 


Remove ASTRR and make ASTEN/ASTSR read/write 
Add TBIAP 

Remove ASN from TBIx and TBCHK 

Remove R17 as input to MxPR’s 

Reserve processor number FFFF FFFF FFFF FFFF;¢ 


Revision 2.0, October 4, 1989 


1. 


2 
3 
4. 
5 


Remove ICIE, IPIE, ISP, KSP, SID, SSN, and TOY 
Add AT and FEN 

Change range of WHAMI 

Remove stack alignment comments 


Change registers used to match calling standard 


Revision 1.0, March 15, 1989 


1. 


First review distribution.\ 
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Chapter 6 


Exceptions, Interrupts, and Machine Checks (II—A) 


6.1 Introduction 


At certain times during the operation of a system, events within the system require the execu- 
tion of software outside the explicit flow of control. When such an exceptional event occurs, 
an Alpha processor forces a change in control flow from that indicated by the current instruc- 
tion stream. The notification process for such events is of one of three types: 


Exceptions 


These events are relevant primarily to the currently executing process and normally 
invoke software in the context of the current process. The three types of exceptions 
are faults, arithmetic traps, and synchronous traps. Exceptions are described in Section 
6.3. 


Interrupts 


These events are primarily relevant to other processes or to the system as a whole and 
are typically serviced in a system-wide context. 


Some interrupts are of such urgency that they require high-priority service, while 
others must be synchronized with independent events. To meet these needs, each 
processor has priority logic that grants interrupt service to the highest priority event at 
any point in time. Interrupts are described in Section 6.4. 


Machine Checks 


These events are generally the result of serious hardware failure. The registers and 
memory are potentially in an indeterminate state such that the instruction execution 
cannot necessarily be correctly restarted, completed, simulated, or undone. Machine 
checks are described in Section 6.5. 


For all such events, the change in flow of control involves changing the Program Counter 
(PC), possibly changing the execution mode (current mode) and/or interrupt priority level 
(IPL) in the Processor Status (PS), and saving the old values of the PC and PS. The old values 
are saved on the target stack as part of an Exception, Interrupt, or Machine Check Stack 
Frame. Collectively, those elements are described in Section 6.2. 


The service routines that handle exceptions, interrupts, and machine checks are specified by 
entry points in the System Control Block (SCB), described in Section 6.6. 
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Return from an exception, interrupt, or machine check is done via the CALL_PAL REI instruc- 
tion. As part of its work, CALL_PAL REI restores the saved values of PC and PS and pops 
them off the stack. 


6.1.1 Differences Between Exceptions, Interrupts, and Machine Checks 


Generally, exceptions, interrupts, and machine checks are similar. However, there are four 
important differences: 


1. 


An exception is caused by the execution of an instruction. An interrupt is caused by 
some activity in the system that may be independent of any instruction. A machine 
check is associated with a hardware error condition. 


The IPL of the processor is not changed when the processor initiates an exception. The 
IPL is always raised when an interrupt is initiated. The IPL is always raised when a 
machine check is initiated, and for all machine checks other than system correctable, is 
raised to 31 (highest priority level). (For system correctable machine checks, the IPL is 
raised to 20.) 


Exceptions are always initiated immediately, no matter what the processor IPL is. Inter- 
rupts are deferred until the processor IPL drops below the IPL of the requesting source. 
Machine checks can be initiated immediately or deferred, depending on error condi- 
tions. 


Some exceptions can be selectively disabled by selecting instructions that do not check 
for exception conditions. If an exception condition occurs in such an instruction, the 
condition is totally ignored and no state is saved to signal that condition at a later time. 


If an interrupt request occurs while the processor IPL is equal to or greater than that of 
the interrupting source, the condition will eventually initiate an interrupt if the 
interrupt request is still present and the processor IPL is lowered below that of the 
interrupting source. | 


Machine checks cannot be disabled. Machine checks can be initiated immediately or 
deferred, depending on the error condition. Also, they can be deliberately generated 
by software. 


6.1.2 Exceptions, Interrupts, and Machine Checks Summary 


Table 6—1 summarizes the actions taken on an exception, interrupt, or machine check. The 
remaining sections in this chapter describe those actions in greater detail. 
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The "SavedPC" column describes what is saved in the "PC" field of the exception or 
interrupt or machine check stack frame. 


1. "Current" indicates the PC of the instruction at which the exception or interrupt or 
machine check was taken, 


2. "Next" indicates the PC of the successor instruction. 


The "NewMode" column specifies the mode and stack that the exception or interrupt or 
machine check routine will start with. For change mode traps, "MostPrv" indicates the 
more privileged of the current and new modes. 
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¢ The "R2" column specifies the value with which R2 is loaded, after its original value 
has been saved in the exception or interrupt or machine check stack frame. The SCB 
vector quadword, "SCBv", is loaded into R2 for all interrupts and exceptions and 
machine checks. 


e The "R3" column specifies the value with which R3 is loaded, after its original value 
has been saved in the exception or interrupt or machine check stack frame. The SCB 
parameter quadword, "SCBp", is loaded into R3 for all interrupts and exceptions and 
machine checks. | 


¢ The "R4" column specifies the value with which R4 is loaded, after its original value 
has been saved in the exception or interrupt or machine check stack frame. If the "R4" 
column is blank, the value in R4 is UNPREDICTABLE on entry to an interrupt or 
exception. 


1. "VA" indicates the exact virtual address that triggered a memory management fault 
or data alignment trap. 


2. "Mask" indicates the Register Write Mask. 


"LAOff" indicates the offset from the base of the logout area in the HWRPB (see 
Section 6.5.2). 


e The "R5" column specifies the value with which R5 is loaded, after its original value 
has been saved in the exception or interrupt or machine check stack frame. If the "R5" 
column is blank, the value in R5 is UNPREDICTABLE on entry to an interrupt or - 
exception or machine check. 


1. "MMF" indicates the Memory Management Flags. 
2. "Exc" indicates the Exception Summary parameter. 


3. "RW" indicates Read/Load =0 Write/Store =1 for data alignment traps 


Table 6-1: Exceptions, Interrupts, and Machine Checks Summary 


SavedPC NewMode R2 R3 R4 R5 
Exceptions — Faults : 


Floating Disabled Fault Current Kernel SCBv SCBp 


Memory Management Faults : 


Access Control Violation Current = Kernel =SCBv SCBp VA = MMF. 
Translation Not Valid Current Kernel ~SCBv SCBp VA MMF 

Fault on Read Current Kernel SCBv SCBp VA MMF 
Fault on Write Current Kernel SCBv SCBp VA MME 
Fault on Execute Current Kernel SCBv SCBp VA MMF 
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Table 6-1: Exceptions, Interrupts, and Machine Checks Summary (Continued) 


SavedPC NewMode R2 R3 R4 R5 
Exceptions — Arithmetic Traps: 
Arithmetic Traps Next Kernel SCBv SCBp Mask Exc 


Exceptions - Synchronous Traps : 


Breakpoint Trap Next Kernel SCBv SCBp 
Bugcheck Trap Next Kernel SCBv SCBp 
Change Mode to K/E/S/U Next MostPrv SCBv SCBp 
Illegal Instruction Next Kernel SCBv SCBp 
Illegal Operand Next Kernel SCBv SCBp 
Data Alignment Trap Next Kernel SCBv SCBp VA RW 


Interrupts : 


Asynch System Trap (4) Current Kernel SCBv  SCBp 

Interval Clock Current Kernel SCBv SCBp 

Interprocessor Interrupt Current Kernel SCBv SCBp 

Software Interrupts Current Kernel SCBv SCBp 

Performance monitor Current Kernel SCBv SCBp IMP IMP 
Passive Release Current Kernel SCBv SCBp | 
Powerfail Current Kernel SCBv SCBp 

I/O Device Current Kernel SCBv  SCBp 


Machine Checks : 


Processor Correctable Current Kernel SCBv SCBp- LAOff 
System Correctable Current Kernel SCBv SCBp- LAOff 
System Current Kernel SCBv SCBp_ LAOff 
Processor Current Kernel SCBv SCBp _ LAOff 
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6.2 Processor State and Exception/Interrupt/Machine 
Check Stack Frame 


Processor state consists of a quadword of privileged information called the Processor Status 
(PS) and a quadword containing the Program Counter (PC), which is the virtual address of the 
next instruction. 


When an exception, interrupt, or machine check is initiated, the current processor state during 
the exception, interrupt, or machine check must be preserved. This is accomplished by auto- 
matically pushing the PS and the PC on the target stack. 


Subsequently, instruction execution can be continued at the point of the exception, interrupt, 
or machine check by executing a CALL_PAL REI instruction (see Section 2.1.11). 


Process context such as memory mapping information is not saved or restored on each excep- 
tion, interrupt, or machine check. Instead, it is saved and restored when process context 
switching is performed. Other processor status is changed even less frequently (see Chapter 4). 


6.2.1 Processor Status 


The PS can be explicitly read with the CALL_PAL RD_PS instruction. The PS<SW> field 
can be explicitly written with the CALL_PAL WR_PS_SW instruction. See Section 2.1. 


The terms current PS and saved PS are used to distinguish between this status information 
when it is stored internal to the processor and when copies of it are materialized in memory. 
The current PS is shown in Figure 6-1, the saved PS in Figure 6—2, and the bits for both are 
described in Table 6-2. 


Figure 6-1: Current Processor Status (PS Register) 
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Figure 6-2: Saved Processor Status (PS on Stack) 
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Table 6—2: Processor Status Register Summary 


Bits Description 


63-62 Reserved to DIGITAL, MBZ. 


61-56 Stack alignment (SP_ALIGN) 
The previous stack byte alignment within a 64-byte aligned area, in the range 0 to 
63. This field is set in the saved PS during the act of taking an exception or inter- 
rupt; it is used by the CALL_PAL REI instruction to restore the previous stack 
byte alignment. 


55-13 Reserved to DIGITAL, MBZ. 


12-8 Interrupt priority level (IPL) 
The current processor priority, in the range 0 to 31. 


7 Virtual machine monitor (VMM). 
When set, the processor is executing in a virtual machine monitor. When clear, the 
processor is running in either real or virtual machine mode. 
Programming Note: 


This bit is only meaningful when running with PALcode that 
implements virtual machine capabilities. 


6-5 Reserved to DIGITAL, MBZ. 


4-3 Current mode (CM) 
The access mode of the currently executing process as follows: 


0 Kernel 

1 Executive 
Z Supervisor 
3 User 


2 Interrupt pending (IP) 
Set when an interrupt (software or hardware but not AST) is initiated; indicates an 
interrupt is in progress. 


1-0 Reserved for Software (SW) 
These bits are reserved for software use and can be read and written at any time by 
the software, regardless of the current mode. The value of these bits is ignored by 
the hardware. The software field is set to zero at the initiation of either an excep- 
tion or an interrupt. 


At bootstrap, the initial value of PS is set to 1F00,¢. Previous stack alignment is zero, IPL is 
31, VMM is clear, CM is kernel, and the SW and IP fields are zero. 


6.2.2 Program Counter 
The PC (Figure 6-3) is a 64-bit virtual address. All instructions are aligned on longword 
boundaries and, therefore, hardware can assume zero for the two low-order PC bits. The PC is 
discussed in Section 6.2.6. 
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The PC can be explicitly read with the Unconditional Branch (BR) instruction. All branching 
instructions also load a new value into the PC. 


Figure 6-3: Program Counter (PC) 


63 210 


| 
Instruction Virtual Address <63:2> 


6.2.3 Processor Interrupt Priority Level (IPL) 


Each processor has 32 interrupt priority levels (IPLs) divided into 16 software levels (num- 
bered 0 to 15), and 16 hardware levels (numbered 16 to 31). User applications and most 
operating system software run at IPL 0, which may be thought of as process level. Higher num- 
bered interrupt levels have higher priority; that is, any request at an interrupt level higher than 
the processor’s current IPL will interrupt immediately, but requests at lower or equal levels are 
deferred. 


Interrupt levels 0 to 15 exist solely for use by software. No hardware event can request an 
interrupt on these levels. Conversely, interrupt levels 16 to 31 exist solely for use by hardware. 
Serious system failures, such as a machine check abort, however, raise the IPL to the highest 
level (31) to minimize processor interruption until the problem is corrected, and execute in ker- 
nel mode on the kernel stack. 


6.2.4 Protection Modes 


Each processor has four protection modes: kernel, executive, supervisor, and user. Per-page 
memory protection varies as a function of mode (for example, a page can be made read-only 
in user mode, but read-write in supervisor, executive, or kernel mode). 


For each process, a separate stack is associated with each mode. Corruption of one stack does 
not affect use of the other stacks. 


Some instructions, termed privileged instructions, may be executed only in kernel mode. 


6.2.5 Processor Stacks 


Each processor has four stacks. There are four process-specific stacks associated with the four 
modes of the current process. At any given time, only one of these stacks is actively used as 
the current stack. 


6.2.6 Stack Frames 


When an exception, interrupt, or machine check occurs, a stack frame (Figure 6—4) is pushed 
on the target stack. Regardless of the type of event notification, this stack frame consists of a 
64-byte-aligned structure that contains the saved contents of registers R2..R7, the Program 
Counter (PC), and the Processor Status (PS). Registers R2 and R3 are then loaded with vector 
and parameter from the SCB for the exception, interrupt, or machine check. Registers R4 and 
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R5 may be loaded with data pertaining to the exception, interrupt, or machine check. The spe- 
cific data loaded is described below in conjunction with each exception, interrupt, or machine 
check; if no specific data is specified, the contents of R4 and R5 are UNPREDICTABLE. 
After the stack is built, the contents of registers R6 and R7 are UNPREDICTABLE. 
The Program Counter value that is saved in the stack frame is: 

e For faults, the instruction that encountered the exception. 


e For traps, the next instruction. 


e For interrupts and (on a best-effort basis) machine checks, the instruction that would 
have been issued if the interrupt or machine-check condition had not occurred. 


Return from an exception, interrupt, or machine check is done via the CALL_PAL REI instruc- 
tion, which restores the saved values of PC, PS, and R2..R7. Thus, the CALL_PAL REI 
instruction: . 


e For faults, re-executes the faulting instruction. 
¢ For traps, executes the next instruction. 


e For interrupts, executes the instruction that would have been executed if the interrupt 
had not occurred. 


e For machine checks, continues execution from the point at which the machine check 
was taken. 


Figure 6-4: Stack Frame 
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6.3 Exceptions 


Exception service routines execute in response to exception conditions caused by software. 
Most exception service routines execute in kernel mode, on the kernel stack; all exception ser- 
vice routines execute at the current processor IPL. Change mode exception routines for 
CHMU/CHMS/CHME execute in the more privileged of the current mode or the target mode 
(U/S/E) on the matching stack. Exception service routines are usually coded to avoid excep- 
tions; however, nested exceptions can occur. 
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Types of Exceptions 
There are three types of exceptions: 


e A fault is an exception condition that occurs during an instruction and leaves the regis- 
ters and memory in a consistent state such that elimination of the fault condition and 
subsequent re-execution of the instruction will give correct results. Faults are not guar- 
anteed to leave the machine in exactly the same state it was in immediately prior to the 
fault, but rather in a state such that the instruction can be correctly executed if the fault 
condition is removed. The PC saved in the exception stack frame is the address of the 
faulting instruction. A CALL_PAL REI instruction to this PC will reexecute the fault- 
ing instruction. 


e =An arithmetic trap is an exception condition that occurs at the completion of the opera- 
tion that caused the exception. Because several instructions may be in various stages of 
execution at any time, it is possible for multiple arithmetic traps to occur simulta- 
neously. The PC that is saved in the exception frame on traps is that of the next instruc- 
tion that would have been issued if the trapping condition(s) had not occurred. This is 
not necessarily the address of the instruction immediately following the one(s) that 
encountered the trap condition, and the intervening instructions are collectively called 
the trap shadow. See Common Architecture, Chapter 4, Arithmetic Trap Completion, 
for information. 


The intervening instructions may have changed operands or other state used by the 
instruction(s) encountering the trap condition(s). If such is the case,,a CALL_PAL 
REI instruction to this PC does not reexecute the trapping instruction(s), nor does it 
reexecute any intervening instructions; it simply continues execution from the point at 
which the trap was taken. 


In general, it is difficult to fix up results and continue program execution at the point 
of an arithmetic trap. Software can force a trap to be continued more easily without 
the need for complicated fixup code. This is accomplished by specifying any valid 
qualifier combination that includes the /S qualifier with each such instruction and 
following a set of code-generation restrictions in the code that could cause arithmetic 
traps, allowing those traps to be completed by an OS completion handler. 


The AND of all the exception completion qualifiers for trapping instructions is 
provided to the OS completion handler in the exception summary SWC bit. If SWC is 
set, the OS completion handler may find the trigger instruction by scanning backward 
from the trap PC until each register in the register write mask has been an instruction 
destination. The trigger instruction is the last instruction in I-stream order to get a trap 
before the trap shadow. If the SWC bit is clear, no fixup is possible. (The trigger 
instruction may have been followed by a taken branch, so the trap PC cannot be used 
to find it.) 


e Asynchronous trap is an exception condition that occurs at the completion of the oper- 
ation that caused the exception (or, if the operation can only be partially carried out, at 
the completion of that part of the operation), and no subsequent instruction is issued 
before the trap occurs. 


Synchronous traps are divided into data alignment traps and all other synchronous 
traps. 
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6.3.1 


6.3.1.1 


6.3.1.2 


6.3.1.3 
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Faults 


The six types of faults signal that an instruction or its operands are in some way illegal. These 
faults are all initiated in kernel mode and push an exception stack frame onto the stack. Upon 
entry to the exception routine, the saved PC (in the exception stack frame) is the virtual 
address of the faulting instruction. 

The six faults include the Floating Disable Fault described in the next section and five mem- 
ory management faults. | 


Memory management faults occur when a virtual address translation encounters an exception 
condition. This can occur as the result of instruction fetch or during a load or store operation. 


Immediately following a memory management fault, register R4 contains the exact virtual 
address encountering the fault condition. 


The register R5 contains the "MM Flag" quadword. 


"MM Flag" is set as follows: 
0000 0000 0000 0000; ¢ for a faulting data read 


0000 0000 0000 0001 j¢ for a faulting I-fetch operation 
8000 0000 0000 0000; ¢ for a faulting write operation 


The faulting instruction is the instruction whose fetch faulted, or the load, store, or PALcode 
instruction that encountered the fault condition. 


Chapter 3 describes the Alpha memory management architecture in more detail. 


Floating Disabled Fault 


A Floating Disabled Fault is an exception that occurs when an attempt is made to execute a 
floating-point instruction and the floating-point enable (FEN) bit in the HWPCB is not set. 


Access Control Violation (ACV) Fault 


An ACV fault is a memory management fault that indicates that an attempted access to a vir- 
tual address was not allowed in the current mode. 


ACV faults usually indicate program errors, but in some cases, such as automatic stack expan- 
sion, can indicate implicit operating system functions. 


ACV faults take precedence over Translation Not Valid, Fault on Read, Fault on Write, and 
Fault on Execute faults. » 


ACV faults take precedence over Translation Not Valid faults so that a malicious user could 
not degrade system performance by causing spurious page faults to pages for which no access 
is allowed. 


Translation Not Valid (TNV) 


A TNV fault is a memory management fault that indicates that an attempted access was made 
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6.3.1.4 


6.3.1.5 


to a virtual address whose Page Table Entry (PTE) was not valid. 


Software may use TNV faults to implement virtual memory capabilities. 


Fault on Read (FOR) 


An FOR fault is a memory management fault that indicates that an attempted data read access 
was made to a virtual address whose Page Table Entry (PTE) had the Fault on Read bit set. 


As a part of initiating the FOR fault, the processor invalidates the Translation Buffer entry that 
caused the fault to be generated. 


Implementation Note: 


This allows an implementation to invalidate entries only from the Data-stream Translation 
Buffer on Fault on Read faults. 


The Translation Buffer may reload and cache the old PTE value between the time the FOR 
fault invalidates the old value from the Translation Buffer and the time software updates the 
PTE in memory. Software that depends on the processor-provided invalidate must thus be pre- 
pared to take another FOR fault on a page after clearing the page’s PTE<FOR> bit. The 
second fault will invalidate the stale PTE from the Translation Buffer, and the processor can- 
not load another stale copy. Thus, in the worst case, a multiprocessor system will take an 
initial FOR fault and then an additional FOR fault on each processor. In practice, even a single 
repetition is unlikely. 


Software may use FOR faults to implement watchpoints, to collect page usage statistics, and 
to implement execute-only pages. 


Fault on Write (FOW) 


A FOW fault is a memory management fault that indicates that an attempted data write access 
was made to a virtual address whose Page Table Entry (PTE) had the Fault On Write bit set. 


As a part of initiating the FOW fault, the processor invalidates the Translation Buffer entry 
that caused the fault to be generated. 


Implementation Note: 


This allows an implementation to invalidate entries only from the Data-stream Translation 
Buffer on Fault on Write faults. 


Note that the Translation Buffer may reload and cache the old PTE value between the time the 
FOW fault invalidates the old value from the Translation Buffer and the time software updates 
the PTE in memory. Software that depends on the processor-provided invalidate must thus be 
prepared to take another FOW fault on a page after clearing the page’s PTE<FOW> bit. The 
second fault will invalidate the stale PTE from the Translation Buffer, and the processor can- 
not load another stale copy. Thus, in the worst case, a multiprocessor system will take an 
initial FOW fault and then an additional FOW fault on each processor. In practice, even a sin- 
gle repetition is unlikely. 
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Software may use FOW faults to maintain modified page information, to implement copy on 
write and watchpoint capabilities, and to collect page usage statistics. 


6.3.1.6 Fault on Execute (FOE) 


An FOE fault is a memory management fault that indicates that an attempted instruction 
stream access was made to a virtual address whose Page Table Entry (PTE) had the Fault On 
Execute bit set. 

As a part of initiating the FOE fault, the processor invalidates the Translation Buffer entry that 
caused the fault to be generated. 


Implementation Note: 


This allows an implementation to invalidate entries only from the Instruction-stream 
Translation Buffer on Fault on Execute faults. 


Note that the Translation Buffer may reload and cache the old PTE value between the time the 
FOE fault invalidates the old value from the Translation Buffer and the time software updates 
the PTE in memory. Software that depends on the processor-provided invalidate must thus be © 
prepared to take another FOE fault on a page after clearing the page’s PTE<FOE> bit. The sec- 
ond fault will invalidate the stale PTE from the Translation Buffer, and the processor cannot 
load another stale copy. Thus, in the worst case, a multiprocessor system will take an initial 
FOE fault and then an additional FOE fault on each processor. In practice, even a single repeti- 
tion is unlikely. 


Software may use FOE faults to implement access mode changes and protected entry to kernel 
mode, to collect page usage statistics, and to detect programming errors that try to execute 
data. 


6.3.2 Arithmetic Traps 


An arithmetic trap is an exception that occurs as the result of performing an arithmetic or con- 
version operation. 


If integer register R31 or floating-point register F31 is specified as the destination of an opera- 
tion that can cause an arithmetic trap, it is UNPREDICTABLE whether the trap will actually 
occur, even if the operation would definitely produce an exceptional result. If the operation 
causes an arithmetic trap, the bit that corresponds to R31 or F31 in the Register Write Mask is 
UNPREDICTABLE. 


Arithmetic traps are initiated in kernel mode and push the exception stack frame on the kernel 
stack. The Register Write Mask is saved in R4, and the Exception Summary parameter is 
saved in RS. These are described in Section 6.3.2.1. 


6.3.2.1 Exception Summary Parameter 


The Exception Summary parameter shown in Figure 6-5 and described in Table 6-3 records 
the various types of arithmetic traps that can occur together. These types of traps are described 
in subsections below. 
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Figure 6-5: Exception Summary 
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Table 6-3: Exception Summary 
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Bit Description 


63-7 Zero. 


6 Integer Overflow (IOV) 
An integer arithmetic operation or a conversion from floating to integer over- 
flowed the destination precision. 


2D Inexact Result (INE) 
A floating arithmetic or conversion operation gave a result that differed from the 
mathematically exact result. 


4 Underflow (UNF) 
A floating arithmetic or conversion operation underflowed the destination expo- 
nent. 
3 Overflow (OVF) 
A floating arithmetic or conversion operation overflowed the destination exponent. 
2: Division by Zero (DZE) 
An attempt was made to perform a floating divide operation with a divisor of zero. 
1 Invalid Operation (INV) 


An attempt was made to perform a floating arithmetic, conversion, or comparison 
operation, and one or more of the operand values were illegal. 


0 Software Completion (SWC) 
Set when all of the other arithmetic exception bits were set by floating-operate 
instructions with the /S exception completion qualifier set. See Common Architec- 
ture, Chapter 4, Arithmetic Trap Completion, for rules about setting the /S quali- 
fier in code that may cause an arithmetic trap, and Section 6.3 for rules about using 
the SWC bit in a trap handler. 


6.3.2.2 Register Write Mask 


The Register Write Mask parameter records all registers that were targets of instructions that 
set the bits in the exception summary register. There is a one-to-one correspondence between 
bits in the Register Write Mask quadword and the register numbers. The quadword records, 
starting at bit 0 and proceeding right to left, which of the registers RO through R31, then FO 
through F31, received an exceptional result. 
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Note: 


For a sequence such as: 


ADDF F1,F2,F3 
MULF F4,F5,F3 


If the add overflows and the multiply does not, the OVF bit is set in the exception 
summary, and the F3 bit is set in the register mask, even though the overflowed sum in F3 
can be overwritten with an in-range product by the time the trap is taken. (This code 
violates the destination reuse rule for software completion. See Common Architecture, 
Chapter 4, Arithmetic Trap Completion, for the destination reuse rules.) 


The PC value saved in the exception stack frame is the virtual address of the next instruction. 
This is defined as the virtual address of the first instruction not executed after the trap condi- 
tion was recognized. 


6.3.2.3 Invalid Operation (INV) Trap 
An INV trap is reported for most floating-point operate instructions with an input operand that 


is a VAX reserved operand, VAX dirty zero, IEEE NaN, IEEE infinity, or JEEE denormal. 


Floating INV traps are always enabled. If this trap occurs, the result register is written with an 
UNPREDICTABLE value. 


6.3.2.4 Division by Zero (DZE) Trap 


A DZE trap is reported when a finite number is divided by zero. Floating DZE traps are 
always enabled. If this trap occurs, the result register is written with an UNPREDICTABLE 
value. 


6.3.2.5 Overflow (OVF) Trap 


An OVF trap is reported when the destination’s largest finite number is exceeded in magni- 
tude by the rounded true result. Floating OVF traps are always enabled. If this trap occurs, the 
result register is written with an UNPREDICTABLE value. 


6.3.2.6 Underflow (UNF) Trap 


A UNF trap is reported when the destination’s smallest finite number exceeds in magnitude 
the non-zero rounded true result. Floating UNF trap enable can be specified in each float- 
ing-point operate instruction. If underflow occurs, the result register is written with a true zero. 


6.3.2.7 Inexact Result (INE) Trap 


An INE trap is reported if the rounded result of an IEEE operation is not exact. INE trap 
enable can be specified in each IEEE floating-point operate instruction. The unchanged result 
value is stored in all cases. 
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6.3.2.8 Integer Overflow (IOV) Trap 


An IOV trap is reported for any integer operation whose true result exceeds the destination reg- 
ister size. IOV trap enable can be specified in each arithmetic integer operate instruction and 
each floating-point convert-to-integer instruction. If integer overflow occurs, the result register 
is written with the truncated true result. 


6.3.3 Synchronous Traps 


6.3.3.1 


A synchronous trap is an exception condition that occurs at the completion of the operation 
that caused the exception (or, if the operation can only be partially carried out, at the comple- 
tion of that part of the operation), but no successor instruction is allowed to start. All traps that 
are not arithmetic traps are synchronous traps. 


Some synchronous traps are caused by PALcode instructions: BPT, BUGCHK, CHMU, 
CHMS, CHME, and CHMK. For synchronous traps, the PC saved in the exception stack 
frame is the address of the instruction immediately following the one causing the trap condi- 
tion. A CALL_PAL REI instruction to this PC will continue without reexecuting the trapping 
instruction. The following subsections describe the synchronous traps in detail. 


Data Alignment Trap 


All data must be naturally aligned or an alignment trap may be generated. Natural alignment 
means that data bytes are on byte boundaries, data words are on word boundaries, data long- 
words are on longword boundaries, and data quadwords are on quadword boundaries. 


A Data Alignment trap is generated by the hardware when an attempt is made to load or store 
a word, a longword, or a quadword to/from a register using an address that does not have the 
natural alignment of the particular data reference. 


Data Alignment traps are fixed up by the PALcode and are optionally reported to the operating 
system under the control of the DAT bit. If the bit is zero, the trap will be reported. If the bit is 
set, after the alignment is corrected, control is returned to the user. In either case, if the PAL- 
code detects a LDx_L or STx_C instruction, no correction is possible. and an illegal operand 
exception is generated. 


Note: 


In the case of concurrently pending data alignment and arithmetic traps, it is assumed that 
the arithmetic trap is reported before PALcode data alignment fixup is performed. 
Otherwise, it would not be possible to back up the PC for the synchronous data alignment 
trap as required by Section 6.7.4. 


The system software is notified via the generation of a kernel mode exception through the 
Unaligned_Access SCB vector (280 ¢) The virtual address of the unaligned data being 
accessed is stored in R4. R5 indicates whether the operation was a read or a write (0 = 
read/load 1 = write/store). 


PALcode may write partial results to memory without probing to make sure all writes will suc- 
ceed when dealing with unaligned store operations. 
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If a memory management exception condition occurs while reading or writing part of the 
unaligned data, the appropriate memory management fault is generated. 


Software should avoid data misalignment whenever possible since the emulation performance 
penalty may be as large as 100-to-1. 


The Data Alignment trap control bit is included in the HWPCB at offset HWPCB[56], bit 63. 
In order to change this bit for the currently executing process, the DATFX IPR may be written 
by using a CALL_PAL MTPR_DATFX instruction. This operation will also update the value 
in the HWPCB. 


6.3.3.2 Other Synchronous Traps 


With the traps described in this subsection, the SCB vector quadword is saved in R2 and the 
SCB parameter quadword is saved in R3. The change mode traps are initiated in the more priv- 
ileged of the current mode and the target mode, while the other traps are initiated in kernel 
mode. 


6.3.3.2.1 Breakpoint Trap 


A Breakpoint trap is an exception that occurs when a CALL_PAL BPT instruction is executed 
(see Section 2.1.1). Breakpoint traps are intended for use by debuggers and can be used to 
place breakpoints in a program. 


Breakpoint traps are initiated in kernel mode so that system debuggers can capture breakpoint 
traps that occur while the user is executing system code. 


6.3.3.2.2 Bugcheck Trap 


A Bugcheck trap is an exception that occurs when a CALL_PAL BUGCHK instruction is exe- 
cuted (see Section 2.1.2). Bugchecks are used to log errors detected by software. 


6.3.3.2.3 Illegal Instruction Trap 


An Illegal Instruction trap is an exception that occurs when an attempt is made to execute an 
instruction when: 

e It has an opcode that is reserved to DIGITAL or reserved to PALcode. 

e It is a subsetted opcode that requires emulation on the host implementation. 


e It is a privileged instruction and the current mode is not kernel. 


¢ It has an unused function code for those opcodes defined as reserved in the Version 5 
Alpha architecture specification (May 1992). 


6.3.3.2.4 Illegal Operand Trap 


An Illegal Operand trap occurs when an attempt is made to execute PALcode with operand 
values that are illegal or reserved for future use by DIGITAL. Illegal operands include: 


e =Aninvalid combination of b y the CALL_PAL REI insiruction. 


e An unaligned operand passed to PALcode. 
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6.3.3.2.5 Generate Software Trap 


A Generate Software trap is an exception that occurs when a CALL_PAL GENTRAP instruc- 
tion is executed (see Section 2.1.8). The intended use is for low-level compiler-generated code 
that detects conditions such as divide-by-zero, range errors, subscript bounds, and negative 
string lengths. 


6.3.3.2.6 Change Mode to Kernel Trap 


A Change Mode to Kernel trap is an exception that occurs when a CALL_PAL CHMK instruc- 
tion is executed (see Section 2.1.4). Change Mode to Kernel traps are initiated in kernel mode 
and push the exception frame on the kernel stack. 


6.3.3.2.7 Change Mode to Executive Trap 


A Change Mode to Executive trap is an exception that occurs when a CALL_PAL CHME 
instruction is executed (see Section 2.1.3). Change Mode to Executive traps are initiated in the 
more privileged of the current mode and Executive mode, and push the exception frame on the 
target stack. 


6.3.3.2.8 Change Mode to Supervisor Trap 


A Change Mode to Supervisor trap is an exception that occurs when a CALL_PAL CHMS 
instruction is executed (see Section 2.1.5). Change Mode to Supervisor traps are initiated in 
the more privileged of the current mode and supervisor mode, and push the exception frame 
on the target stack. 


6.3.3.2.9 Change Mode to User Trap 


A Change Mode to User trap is an exception that occurs when a CALL_PAL CHMU instruc- 
tion is executed (see Section 2.1.6). Change Mode to User traps are initiated in the more 
privileged of the current mode and user mode, and push the exception frame on the target 
stack. 


6.4 Interrupts 


The processor arbitrates interrupt requests according to priority. When the priority of an inter- 
rupt request is higher than the current processor IPL, the processor will raise the IPL and 
service the interrupt request. The interrupt service routine is entered at the IPL of the interrupt- 
ing source, in kernel mode, and on the kernel stack. Interrupt requests can come from I/O 
devices, memory controllers, other processors, or the processor itself. 


The priority level of one processor does not affect the priority level of other processors. Thus, 
in a multiprocessor system, interrupt levels alone cannot be used to synchronize access to 
shared resources. 


Synchronization with other processors in a multiprocessor system involves a combination of 
raising the IPL and executing an interlocking instruction sequence. Raising the IPL prevents 
the synchronization sequence itself from being interrupted on a single processor while the 
interlock sequence guarantees mutual exclusion with other processors. Alternately, one proces- 
sor can issue explicit interprocessor interrupts (and wait for acknowledgment) to put other 
processors in a known software state, thus achieving mutual exclusion. 
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In some implementations, several instructions may be in various stages of execution simulta- 
neously. Before the processor can service an interrupt request, all active instructions must be 
allowed to complete without exception. Thus, when an exception occurs in a currently active 
instruction, the exception is initiated and the exception stack frame built immediately before 
the interrupt is initiated and its stack frame built. 


The following events will cause an interrupt: 
e¢ =©Software interrupts — IPL 1 to 15 
e Asynchronous System Traps — IPL 2 
e Passive Release interrupts — IPL 20 to 23 
e I/O Device interrupts — IPL 20 to 23 
e = Interval Clock interrupt — IPL 22 
¢ Interprocessor interrupt — IPL 22 
¢ Performance Monitor interrupt — IPL 29 
¢ §=Powerfail interrupt — IPL 30 


Interrupts are initiated in kernel mode and push the interrupt stack frame of eight quadwords 
onto the kernel stack. The PC saved in the interrupt stack frame is the virtual address of the 
first instruction not executed after the interrupt condition was recognized. A CALL_PAL REI 
instruction to the saved PC/PS will continue execution at the point of interrupt. 


Each interrupt source has a separate vector location (offset) within the System Control Block 
(SCB). (See Section 6.6.) With the exception of I/O device interrupts, each of the above events 
has a unique fixed vector. I/O device interrupts occupy a range of vectors that can be both stati- 
cally and dynamically assigned. Upon entry to the interrupt service routine, R2 contains the 
SCB vector quadword and R3 contains the SCB parameter quadword. For Corrected Error 
interrupts, R4 optionally locates additional information (see Section 6.5.2). 


In order to reduce interrupt overhead, no memory mapping information is changed when an 
interrupt occurs. Therefore, the instructions, data, and the contents of the interrupt vector for 
the interrupt service routine must be present in every process at the same virtual address. 


Interrupt service routines should follow the discipline of not lowering IPL below their initial 
level. Lowering IPL in this way could result in an interrupt at an intermediate level, which 
would cause the stack nesting to be incorrect. 


Kernel mode software may need to raise and lower IPL during certain instruction sequences 
that must synchronize with possible interrupt conditions (such as powerfail). This can be 
accomplished by specifying the desired IPL and executing a CALL_PAL MTPR_IPL instruc- 
tion or by executing a CALL_PAL REI instruction that restores a PS that contains the desired 
IPL (see Section 2.6.5). 
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6.4.1 Software Interrupts — IPLs 1 to 15 


6.4.1.1 Software Interrupt Summary Register 


The architecture provides fifteen priority interrupt levels for use by software (level 0 is also 
available for use by software but interrupts can never occur at this level). The Software Inter- 
rupt Summary Register (SISR) stores a mask of pending software interrupts. Bit positions in 
this mask that contain a 1 correspond to the levels on which software interrupts are pending. 


When the processor IPL drops below that of the highest requested software interrupt, a soft- 
ware interrupt is initiated and the corresponding bit in the SISR is cleared. 


The SISR is a read-only internal processor register that may be read by kernel mode software 
by executing a CALL_PAL MFPR_SISR instruction (see Section 5.3). 


6.4.1.2 Software Interrupt Request Register 


The Software Interrupt Request Register (SIRR) is a write-only internal processor register 
used for making software interrupt requests. 


Kernel mode software may request a software interrupt at a particular level by executing a 
CALL_PAL MTPR_SIRR instruction (see Section 5.3). 


If the requested interrupt level is greater than the current IPL, the interrupt will occur before 
the execution of the next instruction. If, however, the requested level is equal to or less than 
the current processor IPL, the interrupt request will be recorded in the Software Interrupt Sum- 
mary Register (SISR) and deferred until the processor IPL drops to the appropriate level. 


Note that no indication is given if there is already a request at the specified level. Therefore, 
the respective interrupt service routine must not assume that there is a one-to-one correspon- 
dence between interrupts requested and interrupts generated. A valid protocol for generating 
this correspondence is: 


1. The requester places information in a control block and then inserts the control block in 
a queue associated with the respective software interrupt level. 


2. The requester uses CALL_PAL MTPR_SIRR to request an interrupt at the appropriate 
level. 


3. When enabling conditions arise, processor HW clears the appropriate SISR bit as part 
of initiating the software interrupt. 


4. The interrupt service routine attempts to remove a control block from the request queue. . 
If there are no control blocks in the queue, the interrupt is dismissed with a CALL_PAL 
REI instruction. 


5. Ifa valid control block is removed from the queue, the requested service is performed 
and step 3 is repeated. 
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6.4.2 Asynchronous System Trap — IPL 2 


Asynchronous System Traps (ASTs) are a means of notifying a process of events that are not 
synchronized with its execution, but that must be dealt with in the context of the process. An 
AST is initiated in kernel mode at IPL 2 when the current mode is less privileged than or equal 
to a mode for which an AST is pending and not disabled, with PS<IPL> less than 2 (see Sec- 
tions 6.7.6 and 4.3). 


There are four separate per-mode SCB vectors, one for each of kernel, executive, supervisor, 
and user modes. 


On encountering an AST, the interrupt stack frame is pushed on the kernel stack. The value of 
the PC saved in this stack frame is the address of the next instruction to have been executed if 
the interrupt had not occurred. The SCB vector quadword is saved in R2 and the SCB parame- 
ter quadword in R3. 


6.4.3 Passive Release Interrupts — IPLs 20 to 23 


Passive releases occur when the source of an interrupt granted by a processor cannot be deter- 
mined. This can happen when the requesting I/O device determines that it no longer requires 
an interrupt after requesting one or when a previously requested interrupt has already been ser- 
viced by another processor in some multiprocessor configurations. The interrupt handler for 
passive releases executes at the priority level of the interrupt request. 


6.4.4 I/O Device Interrupts — IPLs 20 to 23 


The architecture provides four priority levels for use by I/O devices. I/O device interrupts are 
requested when the device encounters a completion, attention, or error condition and the 
respective interrupt is enabled. See Console Interface (III), Chapter 2, for more information. 


6.4.5 Interval Clock Interrupt — IPL 22 


The interval clock requests an interrupt periodically. 


At least 1000 interval clock interrupts occur per second. An entry in the HWRPB contains the 
number of interval clock interrupts per second that occur in an actual Alpha implementation, 
scaled up by 4096, and rounded to a 64-bit integer. (See Console Interface (III), Chapter 2.) 


The accuracy of the interval clock must be at least 50 parts per million (ppm). 


Hardware/Software Note: 


For example, an interval of 819.2 usec derived from a 10 MHz Ethernet clock and a 13-bit 
counter is acceptable. 

To guarantee software progress, the interval clock interrupt should be no more frequent 
than the time it takes to do 500 main memory accesses. Over the life of the architecture, 
this interval may well decrease much more slowly than CPU cycle time decreases. 


2 


Other constraints may apply to secure kernel systems. 
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6.4.6 Interprocessor Interrupt — IPL 22 


Interprocessor interrupts are provided to enable operating system software running on one pro- 
cessor to interrupt activity on another processor and cause operating system-dependent actions 
to be performed. 


6.4.6.1 Interprocessor Interrupt Request Register 


The Interprocessor Interrupt Request Register (IPIR) is a write-only internal processor register 
used for making a request to interrupt a specific processor. 


Kernel mode software may request to interrupt a particular processor by executing a 
CALL_PAL MTPR_IPIR instruction (see Section 5.3.) 


If the specified processor is the same as the current processor and the current IPL is less than 
22, then the interrupt may be delayed and not initiated before the execution of the next 
instruction. 


Note that, as with software interrupts, no indication is given as to whether there is already an 
interprocessor interrupt pending when one is requested. Therefore, the interprocessor interrupt 
service routine must not assume there is a one-to-one correspondence between interrupts 
requested and interrupts generated. A valid protocol similar to the one for software interrupts 
for generating this correspondence is: 


1. The requester places information in a control block and then inserts the control block in 
a queue associated with the target processor. 


2. The requester uses CALL_PAL MTPR_IPIR to request an interprocessor interrupt on 
the target processor. 


3. The interprocessor interrupt service routine on the target processor attempts to remove a 
control block from its request queue. If there are no control blocks remaining, the inter- 
rupt is dismissed with a CALL_PAL RE] instruction. 


4. Ifa valid control block is removed from the queue, the specified action is performed 
and step 3 is repeated. 


6.4.7 Performance Monitor Interrupts — IPL 29 


These interrupts provide some of the support for processor or system performance measure- 
ments. The implementation is processor or system specific. 


6.4.8 Powerfail Interrupt — IPL 30 


If the system power supply backup option permits powerfail recovery, a powerfail interrupt is 
generated to each processor when power is about to fail. See Console Interface (IIL), Chapter 3 
for a description of powerfail recovery requirements and for a description of the interactions 
between system software and the console during system restarts. 


In systems in which the backup option maintains only the contents of memory and keeps sys- 
tem time with the BB_WATCH, the power supply requests a powerfail interrupt to permit 
volatile system state to be saved. Prior to dispatching to the powerfail interrupt service routine, 
PALcode is responsible for saving all system state that is not visible to system software. Such 
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state includes, but is not limited to, processor internal registers and PALcode temporary 
variables. 


PALcode is also responsible for saving the contents of any write-back caches or buffers, 
including the powerfail interrupt stack frame. System software is responsible for saving all 
other system state. Such state includes, but is not limited to, processor registers and write-back 
cache contents. State can be saved by forcing all written data to a backed-up part of the mem- 
ory subsystem; software may use the CALL_PAL CFLUSH instruction. 


The powerfail interrupt will not be initiated until the processor IPL drops below 30. Thus, criti- 
cal code sequences can block the power-down sequence by raising the IPL to 31. Software, 
however, must take extra care not to lock out the power-down sequence for an extended 
period of time. The time interval is platform specific. 


Explicit state is not provided by the architecture for software to directly determine whether 
there were outstanding interrupts when powerfail occurred. It is the responsibility of software 
to leave sufficient information in memory so that it may determine the proper action on 
power-up. 


6.5 Machine Checks 


A machine check, or mcheck, indicates that a hardware error condition was detected and may 
or may not be successfully corrected by hardware or PALcode. Such error conditions can 
occur either synchronously or asynchronously with respect to instruction execution. There are 
four types: 


1. System Machine Check (IPL 31) 


These machine checks are generated by error conditions that are detected 
asynchronously to processor execution but are not successfully corrected by hardware 
or PALcode. Examples of system machine check conditions include protocol errors on 
the processor-memory-interconnect (PMI) and unrecoverable memory errors. 


System machine checks are always maskable and deferred until processor IPL drops 
below IPL 31. 


2. Processor Machine Check (IPL 31) 


These machine checks indicate that a processor internal error was detected and not 
successfully corrected by hardware or PALcode. Examples of processor machine 
check conditions include processor internal cache errors, translation buffer parity 
errors, or read access to a nonexistent local I/O space location (NXM). 


Processor machine checks may be nonmaskable or maskable. If nonmaskable, they are 
initiated immediately, even if the processor IPL is 31. If maskable, they are deferred 
until processor IPL drops below IPL 31. 
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3. System Correctable Machine Check (IPL 20) 


These machine checks are generated by error conditions that are detected 
asynchronously to processor execution and are successfully corrected by hardware or 
PALcode. Examples of system correctable machine check conditions include 
single-bit errors within the memory subsystem. 


System correctable machine checks are always maskable and deferred until processor 
IPL drops below IPL 20. 


4. Processor Correctable Machine Check (IPL 31) 


These machine checks indicate that a processor internal error was detected and 
successfully corrected by hardware or PALcode. Examples of processor correctable 
machine check conditions include corrected processor internal cache errors and 
corrected translation buffer table errors. 


Processor correctable machine checks may be nonmaskable or maskable. If 
nonmaskable, they are initiated immediately, even if the processor IPL is 31. If 
maskable, they are deferred until processor IPL drops below IPL 31. 


Machine checks are initiated in kernel mode, on the kernel stack, and cannot be disabled. 


Correctable machine checks permit the pattern and frequency of certain errors to be captured. 
The delivery of these machine checks to system software can be disabled by setting IPR 
MCES<4:3>, as described in Section 5.3.9. Note that setting IPR MCES<4:3> does not dis- 
able the generation of the machine check or the correction of the error, but rather suppresses 
the reporting of that correction to system software. 


The PC in the machine check stack frame is that of the next instruction that would have issued 
if the machine check condition had not occurred. This is not necessarily the address of the 
instruction immediately following the one encountering the error, and intervening instructions 
may have changed operands or other state used by the instruction encountering the error condi- 
tion. A CALL_PAL REI instruction to this PC will simply continue execution from the point 
at which the machine check was taken. 


Note: 


On machine checks, a meaningful PC is delivered on a best-effort basis. The machine 
state, processor registers, memory, and I/O devices may be indeterminate. 


Machine checks may be deliberately generated by software, such as by probing nonexistent 
memory during memory sizing or searching for local I/O devices. In such a case, the DRAINA 
PALcode instruction can be called to force any outstanding machine checks to be taken before 
continuing. 


6.5.1 Software Response 


The reaction of system software to machine checks is specific to the characteristics of the pro- 
cessor, platform, and system software. System software must determine if operation should be 
discontinued on an implementation-specific basis. 
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To assist system software, PALcode provides a retry flag in the machine check logout frame 
(see Figure 6-6). If the retry flag is set, the state of the processor and platform hardware has 
not been compromised; system software operation should be able to continue. 


If the retry flag is clear, the state of the processor is either unknown or is known to have been 
updated during partial execution of one or more instructions. System software operation can 
continue only after system software determines that the hardware state change permits and/or 
takes corrective action. 


PALcode should take appropriate implementation-specific actions prior to setting the retry 
flag. PALcode should also attempt to ensure that each encountered error condition generates 
only one machine check. 


Implementation Note: 


An important example of using the retry flag is read NXM. Also, a read NXM should not 
generate both a Processor Machine Check and a System Machine Check. 


PALcode sets an internal Machine-Check-In-Progress flag in the Machine Check Error Sum- 
mary (MCES) register prior to initiating a system or processor machine check. System 
software must clear that flag to dismiss the machine check. If a second uncorrectable machine 
check hardware error condition is detected while the flag is set, or if PALcode cannot deliver 
the machine check, PALcode forces the processor to enter console I/O mode, and subsequent 
actions, such as processor restart, are taken by the console. The REASON FOR HALT code is 
"double error abort encountered."See Console Interface (III), Chapter 3. 


Similiarly, PALcode sets an internal correctable Machine-Check-In-Progress flag in the 
Machine Check Error Summary (MCES) register prior to initiating a system-correctable error 
interrupt or processor-correctable machine check. System software must clear that flag to dis- 
miss the condition and permit the reuse of the logout area. If a second correctable hardware 
error condition is detected while the flag is set, the error is corrected, but not reported. PAL- 
code does not overwrite the logout area and the processor remains in program I/O mode. 


6.5.2 Logout Areas | 


When a hardware error condition is encountered, PALcode optionally builds a logout frame 
prior to passing control to the machine check service routine. The logout frame is shown in 
Figure 6-6 and described in Table 6—4. The logout frame is built in the logout area located by 
the processor’s per-CPU slot in the HWRPB (see Console Interface (III), Chapter 2). 


| DIGITAL Restricted Distribution 
6-24 OpenVMS Alpha Software (II-A) 


Figure 6-6: Corrected Error and Machine Check Logout Frame 


63 62 61 32 31 0 





:+FRAME SIZE 
Table 6—4: Corrected Error and Machine Check Logout Frame Fields 
Offset Description 
FRAME FRAME SIZE — Size in bytes of the logout frame, including the 
FRAME SIZE longword. 
+04 FRAME FLAGS — Informational flags. 


Bit Description 


31 RETRY FLAG — Indicates whether execution can be resumed after 
dismissing this machine check. Set on Corrected Error interrupts; may 
be set on machine checks. 

30 SECOND ERROR FLAG — Indicates that a second correctable error 
was encountered. Set on Corrected Error interrupts when a correctable 
error was encountered while the relevant correctable error bit (PCE or 
SCE) is set in the MCES register. Clear on machine checks. 

29-0 SBZ. 


+08 ~ -€PU OFFSET — Offset in bytes from the base of the logout frame to the 
CPU-specific information. If CPU OFFSET is equal to 16, the frame con- 
tains no PALcode-specific information. If CPU OFFSET is equal to SYS 
OFFSET, the frame contains no CPU-specific information. 


+12. SYS OFFSET — Offset in bytes from the base of the logout frame to the 
system-specific information. If SYS OFFSET is equal to FRAME SIZE, 
the frame contains no system-specific information. 


+16 PALCODE INFORMATION — PALcode-specific logout information. 
+CPU OFFSET CPU INFORMATION — CPU-specific logout information. 
+SYS OFFSET SYS INFORMATION — System platform-specific logout information. 


The logout frame is optional; the service routine uses R4 to locate the frame, if any. Upon 
entry to the service routine, R4 contains the byte offset of the logout frame from the base of 
the logout area. If no frame was built, R4 contains —1. 
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6.6 System Control Block 


The System Control Block (SCB) specifies the entry points for exception, interrupt, and 
machine check service routines. The block is from 8K to 32K bytes long, must be page 
aligned, and must be physically contiguous. The PFN is specified by the value of the System 
Control Block Base (SCBB) internal register. 


The SCB, shown in Figure 6-7, consists of from 512 to 2048 entries, each 16 bytes long. The 
first eight bytes of an entry, the vector, specify the virtual address of the service routine associ- 
ated with that entry. The second eight bytes, the parameter, are an arbitrary quadword value to 
be passed to the service routine. 


Figure 6-7: System Control Block Summary 


The SCB entries are grouped as follows: 
















000-0FO 
200-230 
240-270 
280-3F0 
400-4F0 
500-5FO 
600-6FO 





700-7FO 
800-7FFO 


e §=6Faults 

e =6 Arithmetic traps 

e Asynchronous system traps 

e¢ =©Data alignment trap 

¢ Other synchronous traps 

e Processor software interrupts 

e Processor hardware interrupts and machine checks 
e =J/O device interrupts 


The first 512 entries (offsets 0000 through 800,¢) contain all architecturally defined and any 
statically allocated entries. All remaining SCB entries, if any, are used only for those I/O 
device interrupt vectors that are assigned dynamically by system software. It is the responsibil- 
ity of that software to ensure the consistency of the assigned vector and the SCB entry. 
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6.6.1 SCB Entries for Faults 


The exception handler for a fault executes with the IPL unchanged, in kernel mode, on the ker- 
nel stack. Table 6—5 lists the SCB entries for faults. 


Table 6—5: SCB Entries for Faults 


Byte offset,¢ Entry name 


000 Unused 

010 Floating Disabled fault 
020-070 Unused 

080 Access Control Violation fault 
090 Translation Not Valid fault 
0AO Fault on Read fault 

OBO Fault on Write fault 

OCO Fault on Execute fault 
OAO-OFO Unused 


6.6.2 SCB Entries for Arithmetic Traps 


The exception handler for an arithmetic trap executes with the IPL unchanged, in kernel mode, 
on the kernel stack. Table 6—6 lists the SCB entries for arithmetic traps. 


Table 6-6: SCB Entries for Arithmetic Traps 


Byte offset, ¢ Entry name 


200 Arithmetic Trap 
210-230 Unused 


6.6.3 SCB Entries for Asynchronous System Traps (ASTs) 


The interrupt handler for an asynchronous system trap executes at IPL 2, in kernel mode, on 
the kernel stack. Table 6-7 lists the SCB entries for asynchronous system traps. 


Table 6-7: SCB Entries for Asynchronous System Traps 


Byte offset), Entry name 


240 Kernel Mode AST 
250 Executive Mode AST 
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Table 6-7: SCB Entries for Asynchronous System Traps (Continued) 


Byte offset ¢ Entry name 


260 Supervisor Mode AST 
270 User Mode AST 


6.6.4 SCB Entries for Data Alignment Traps 


The exception handler for a data alignment trap executes with the IPL unchanged in kernel 
mode, on the kernel stack. Table 6-8 lists the SCB entries for data alignment traps. — 


Table 6-8: SCB Entries for Data Alignment Trap 


Byte offset ¢ Entry name 


280 Unaligned_Access 
290-3FO Unused 


6.6.5 SCB Entries for Other Synchronous Traps 


The exception handler for a synchronous trap, other than those described above, executes with 
the IPL unchanged, in the mode and on the stack indicated below. "MostPriv" indicates that 
the handler executes in either the original mode or the new mode, whichever is the most privi- 
leged. Table 6—9 lists the SCB entries for other synchronous traps. 


Table 6-9: SCB Entries for Other Synchronous Traps 


Byte Offset}, Entry Name Mode 
400 Breakpoint Trap Kernel 
410 Bugcheck Trap Kernel 
420 Illegal Instruction Trap Kernel 
430 Illegal Operand Trap Kernel 
440 Generate Software Trap Kernel 
450 Unused 

460 Unused 

470 Unused 

480 Change Mode to Kernel Kernel 
490 Change Mode to Executive  MostPriv: 
4A0 Change Mode to Supervisor MostPriv 
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Table 6-9: SCB Entries for Other Synchronous Traps (Continued) 


Byte Offset}, © Entry Name Mode 
4BO | Change Mode to User Current 
4C0-4F0 Reserved for DIGITAL 


6.6.6 SCB Entries for Processor Software Interrupts 


The exception handler for a processor software interrupt executes at the target IPL, in kernel 
mode, on the kernel stack. Table 6-10 lists the SCB entries for processor software interrupts. 


Table 6-10: SCB Entries for Processor Software Interrupts 


Byte Offset), Entry Name Target IPL 49 
500 Unused 

510 Software interrupt level 1 1 
520 Software interrupt level 2 2 
530 Software interrupt level 3 3 
540 Software interrupt level 4 4 
550 Software interrupt level 5 5 
560 Software interrupt level 6 6 
570 Software interrupt level 7 7 
580 Software interrupt level 8 8 
590 Software interrupt level 9 9 
5A0 Software interrupt level 10 10 
5B0 Software interrupt level 11 11 
5C0 Software interrupt level 12 12 
SD0 Software interrupt level 13 13 
SEO Software interrupt level 14 14 
5F0 Software interrupt level 15 15 


6.6.7 SCB Entries for Processor Hardware Interrupts and Machine Checks 


The interrupt handler for a processor hardware interrupt executes at the target IPL, in kernel 
mode, on the kernel stack. 


The handler for machine checks executes in kernel mode, on the kernel stack. The handler for 
system-correctable machine checks executes at IPL 20; the handler for all other machine 
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checks executes at IPL 31. Table 6-11 lists the SCB entries for processor hardware interrupts 
and machine checks. 


Table 6-11: SCB Entries for Processor Hardware Interrupts and Machine 


Checks | 
Byte Offset}, | Entry name Target IPL 19 
600 Interval clock interrupt 22 
610 Interprocessor interrupt pip) 
620 System correctable machine check 20 
630 Processor correctable machine check 31 
640 Powerfail interrupt 30 
650 Performance monitor 29 
660 System machine check 31 
670 Processor machine check 31 
680-6E0 Reserved — processor specific 
6FO Passive release 20-23 


Processor-specific SCB entries include those used by console devices (if any) or other periph- 
erals dedicated to system support functions. 


6.6.8 SCB Entries for I/O Device Interrupts 


The interrupt handler for an I/O device interrupt executes at the target IPL, in kernel mode, on 
the kernel stack. SCB entries for offsets of 800, through 7FFO,¢ are reserved for I/O device 


interrupts. 


6.7 PALcode Support 


6.7.1 Stack Writeability 


In response to various exceptions, interrupts, and machine checks, PALcode pushes informa- 
tion on the kernel stack. PALcode may write this information without first probing to ensure 
that all such writes to the kernel stack will succeed. If a memory management exception 
occurs while pushing information, PALcode forces the processor to enter console I/O mode, 
and subsequent actions, such as processor restart, are taken by the console. The REASON 
FOR HALT code is "processor halted due to kernel-stack-not-valid." See Console Interface 
(II), Chapter 3. 


6.7.2 Stack Residency 


The user, supervisor, and executive stacks for the current process do not need to be resident. 
Software running in kernel mode can bring in or allocate stack pages as TNV faults occur. 
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However, since this activity is taking place in kernel mode, the kernel stack must be fully 
resident. 


When the faults TNV, ACV, FOR, and FOW occur on kernel mode references to the kernel 
stack, they are considered serious system failures from which recovery is not possible. If any 
of those faults occur, PALcode forces the processor to enter console I/O mode, and subsequent 
actions, such as processor restart, are taken by the console. The REASON FOR HALT code is 
"processor halted due to kernel-stack-not-valid." See Console Interface (III), Chapter 3. 


6.7.3 Stack Alignment 


Stacks may have arbitrary byte alignment, but performance may suffer if at least octaword 
alignment is not maintained by software. 


PALcode creates stack frames in response to exceptions and interrupts. Before doing so, the 
target stack is aligned to a 64-byte boundary by setting the six low bits of the target SP to 
000000,. The previous value of these bits is stored in the SP_ALIGN field of the saved PS in 


memory, for use by a CALL_PAL REI instruction. 


Software-constructed stack frames must be 64-byte aligned and have SP_ALIGN properly set; 
otherwise, a CALL_PAL REI instruction will take an illegal operand trap. 


6.7.4 Initiate Exception or Interrupt or Machine Check 


Exceptions, interrupts, and machine checks are initiated by PALcode with interrupts disabled. 
When an exception, interrupt, or machine check is initiated, the associated SCB vector is read 
to determine the address of the service routine. PALcode then attempts to push the PC, PS, 
and R2..R7 onto the target stack. When an interrupt (software or hardware but not AST) is ini- 
tiated, PS<IP> is set to 1 to indicate an interrupt is in progress. Additional parameters may be 
passed in R4 and R5 on exceptions and machine checks. 


During the attempt to push this information, the exceptions (faults) TNV, ACV, and FOW can 
occur: 


e If any of those faults occur when the target stack is user, supervisor, or executive, then 
the fault is taken on the kernel stack. . 


e If any of those faults occur when the target stack is the kernel stack, PALcode forces the 
processor to enter console I/O mode, and subsequent actions, such as processor restart, 
are taken by the console. The REASON FOR HALT code is "processor halted due to 
kernel-stack-not-valid." See Console Interface (III), Chapter 3. 
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6.7.5 Initiate Exception or Interrupt or Machine Check Model 


check_for_exception_or_interrupt_or_mcheck: 
IF NOT {ready to initiate exception OR 
ready to initiate interrupt OR 
ready to initiate mcheck} THEN 
BEGIN 
{fetch next instruction} 
{decode and execute instruction} 
END 
ELSE 
BEGIN 
{wait for instructions in progress to complete} 
! clear interrupt pending 
tmp < 0 
IF {exception pending} THEN 
BEGIN . 

{back up implementation specific state if necessary, 
this includes the PC if synchronous trap pending} 
new_ipl <- PS<IPL> 
new_mode <- Kernel 

END 


ELSE IF {unmaskable mcheck pending} THEN 
BEGIN 
{back up implementation specific state if necessary} 
{attempt correction if appropriate} 
IF {uncorrectable AND MCES<0> = 1} THEN 
{enter console} 
ELSE IF {uncorrectable} THEN 
new_mode <— Kernel 
new_ipl < 31 
! set mcheck error flag 
MCES<0> < 1 
ELSE IF {reporting enabled} THEN 
new_mode <— Kernel 
new ipl < 31 
MCES<2> < 1 
END 
END 


ELSE IF {data alignment trap} THEN 
new_mode <- Kernel 


ELSE IF {synchronous trap} THEN 
CASE {opcode} OF 
{back up implementation specific state if necessary} 
CHME: new mode <- min(PS<CM>,Executive) 
CHMS: new_mode <— min(PS<CM>,Supervisor ) 
CHMU: new_mode <— min(PS<CM>,User) 
otherwise: new_mode <— Kernel 


ENDCASE 
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ELSE IF {maskable uncorrectable mcheck pending and IPL < 31} THEN 
BEGIN 
{back up implementation specific state if necessary} 
IF {MCES<0> = 1} THEN 
{enter console} 
ELSE 
new mode <— Kernel 
new ipl < 31 
MCES<0> < 1 ! set mcheck error flag 
END 
END 


ELSE IF {interrupt pending} THEN 
new_ipl < {interrupt source IPL} 
tmp < 1 ! set interrupt pending 
new_mode <— Kernel 


ELSE IF {maskable correctable mcheck pending AND 
reporting enabled} THEN 
new_ipl < 20 
MCES<1> < 1 
new_mode <— Kernel 
END — 


IPR_SP[PS<CM>] <- SP 
new_sp <~ IPR_SP[new_mode] 
save align <- new_sp<5:0> 
new_sp<5:0> < 0 


PUSH(PS OR LEFT SHIFT(save_align,56), old_pc, new_mode) 
PUSH(R7, R6, new_mode) 
PUSH(R5, R4, new_mode) 
PUSH(R3, R2, new_mode) 


PS<SW> < 0 
PS<CM> <— new_mode 
PS<IP> < tmp 
PS<IPL> <— new_ipl 
SP < new_sp 


IF {memory management fault} THEN 
R4 ¢ VA 
R5 < MMF 

END 


IF {data alignment trap} THEN 

R4 <¢< VA 

R5 < { 0 if read/load 1 if write/store } 
END 
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IF {mcheck or correctable error interrupt} THEN 
IF {logout frame built} 
R4 < logout_area_offset 
ELSE 
R4 ¢ -1 
END 
END 


IF {arithmetic Trap} THEN 
R4 ¢ register write mask 
R5 < exception summary 
END 


IF {software interrupt} THEN 
SISR < SISR AND NOT{ 2**{ PRIORITY ENCODE(SISR) } } 
END 


vector < {exception or interrupt or mcheck SCB offset} 


R2 < (SCBB + vector) 
R3 < (SCBB + vector + 8) 
PC < R2 


END 
GOTO check_for_exception_or interrupt _or_mcheck 


PROCEDURE PUSH(first, last, mode) 
BEGIN 
IF ACCESS(new_sp - 16, mode) THEN 
BEGIN 
(new_sp - 8) < first 
(new_sp - 16) < last 
new_sp <- new_sp - 16 
RETURN 
END 
ELSE 
{initiate ACV, TNV, or FOW fault, or 
Kernel Stack Not Valid restart sequence} 
END 
END 


6.7.6 PALcode Interrupt Arbitration 


The following sections describe the logic for the interrupt conditions produced by the speci- 
fied operation. 
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6.7.6.1 Writing the AST Summary Register 


Writing the ASTSR internal processor register (Section 5.3) requests an AST for any of the 
four processor modes. This operation may request an AST on a formerly inactive level and 


thus cause an AST interrupt. The logic required to check for this condition is: 


ASTSR<3:0> <— {ASTSR<3:0> AND R16<3:0>} OR R16<7:4> 
IF ASTEN<0> AND ASTSR<0> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


6.7.6.2 Writing the AST Enable Register 


Writing the ASTEN internal processor register (Section 5.3) enables ASTs for any of the four 
processor modes. This operation may enable an AST on a formerly inactive level and thus 
cause an AST interrupt. The logic required to check for this condition is: 


ASTEN<3:0> <— {ASTEN<3:0> AND R16<3:0>} OR R16<7:4> 
IF ASTEN<0> AND ASTSR<0> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


6.7.6.3 Writing the IPL Register 


Writing the IPL internal processor register (Section 5.3) changes the current IPL. This opera- 
tion may enable an AST or software interrupt on a formerly inactive level and thus cause an 
AST or software interrupt. The logic required to check for this condition is: 


PS<IPL> < R16<4:0> 
! check for software interrupt at level 2..15 


IF {RIGHT SHIFT({SISR AND FFFC;¢ }, PS<IPL> + 1) NE 0} THEN 
{initiate software interrupt at IPL of high bit set in SISR} 


! check for AST 


IF ASTEN<0> AND ASTSR<0> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


! check for software interrupt at level 1 


IF SISR<1> AND {PS<IPL> EQ 0} THEN 
{initiate software interrupt at IPL 1} 
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6.7.6.4 Writing the Software Interrupt Request Register 


Writing the SIRR internal processor register (Section 5.3) requests a software interrupt at one 
of the fifteen software interrupt levels. This operation may cause a formerly inactive level to 
cause a software interrupt. The logic required to check for this condition is: 


SISR<level> < 1 
IF level GT PS<IPL> THEN 
{initiate software interrupt at IPL level} 


6.7.6.4.1 Return from Exception or Interrupt 


The CALL_PAL REI instruction (Section 2.1.11) writes both the Current Mode and IPL fields 
of the PS (see Section 6.2). This may enable a formerly disabled AST or software interrupt to 
occur. The logic required to check for this condition is: 


PS <— New PS 
! check for software interrupt at level 2..15 


IF {RIGHT SHIFT({SISR AND FFFC,, }, PS<IPL> + 1) NE 0} THEN 
{initiate software interrupt at IPL of high bit set in SISR} 


! check for AST 


tmp <- NOT LEFT SHIFT(1110(bin), PS<CM>) 
IF {{tmp AND ASTEN AND ASTSR}<3:0> NE 0} AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


! check for software interrupt at level 1 


IF SISR<1> AND {PS<IPL> EQ 0} THEN 
{initiate software interrupt at IPL 1} 


6.7.6.5 Swap AST Enable 


Swapping the AST enable state for the Current Mode results in writing the ASTEN internal 
processor register (see Section 5.3). This operation may enable a formerly disabled AST to 
cause an AST interrupt. The logic required to check for this condition is: 


RO <— ZEXT(ASTEN<PS<CM>> ) 
ASTEN<PS<CM>> < R16<0> 


IF ASTEN<PS<CM>> AND ASTSR<PS<CM>> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 
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6.7.7 Processor State Transition Table 


Table 6-12 shows the operations that can produce a state transition and the specific transition 
produced. For example, if a processor’s initial state is supervisor mode, it is not possible for 
the processor to transition to a program halt condition. A processor can only transition to pro- 
gram halt from kernel mode. 


In Table 6-12: 


"REI" increases mode or lowers IPL. 


"MTPR" changes IPL or is a CALL_PAL MTPR_ASTSR or CALL_PAL 
MTPR_ASTEN instruction that causes an interrupt request. 


"Exc" is a state change caused by an exception. 
"Int" is a state change caused by an interrupt. 


"Mcheck" is a state change caused by a machine check. 


Table 6-12: Processor State Transitions 


Initial State: Final State: 


User 


User Super. Exec. Kernel Program Halt 


CHMU CHMS CHME CHMK Not Possible 
REI Exc 

Int 

Mcheck 

SWASTEN 


Supervisor REI CHMS CHME CHMK Not Possible 


REI Exc 
Int 
Mcheck 
SWASTEN 


Executive REI REI CHME CHMK Not Possible 


REI Exc 
Int 
Mcheck 
SWASTEN 


Kernel ; REI REI REI CHMK HALT 


REI 

Exc 

Int 

Mcheck 
MTPR 
SWASTEN 
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6.8 \Revision History 


Revision 7.0, November 13, 1997 


1. 
2 
3. 


Converted to FrameMaker 
Alpha AXP ——> Alpha 
OpenVMS AXP ——> OpenVMS Alpha 


Revision 6.0, December 12, 1994 


ae ON a ee 


Alpha ——> Alpha AXP 

Edited SBC section for A_SRM note 167.4 

Edited for A_LSRM note 167.5, register write mask for R31/F31 /w arith trap 
For A_SRM note 171.2, fixed pointer to I-4.7.5.x 

Added ECO 61, Trap unused function codes 

Added ECO $57, Initiate E/I/Mc model corrections 


Corrected description of PC in stack frames section 


Revision 5.0, May 12, 1992 


ie 


SO OE Oe AY oA 


Removed intr_flag and lock_flag from initiate excep inter mcheck model 


Added eco #45 — correctable errors (machine checks), performance monitor, and pas- 
sive release information 


Conditionalized references to platform section 

Widget ——-> device 

Reordered and combined sections to consolidate information 

Added eco #30, #44 (DATFX) also eco #29 (GENTRAP) 

Corrected init exception model for eco 25 PS(IP) bit and eco 23 (timer) 
DRAINT to TRAPB 

Converted to SDML 


10. Added ECO #18, #23 (removed AT references), #25 
11. Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


Aw Fw N > 


On Memory Management Faults, R4 now contains the exact faulting address 
Removed references to D_float 

Typos 

Note reason for unaligned load locked and store conditional vectors 

Correct reference from AST Request Register to AST Summary Register 


Correct pointer to location of physical address of error logout area from R2 to R4 in 
Processor Machine Check Abort section 
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10. 
11. 


12. 
13. 
14. 


15; 
16. 


ays 
18. 
19. 


20. 


21. 


22: 
22: 


24. 
Zo 
26. 


27. 


28. 


29: 


30. 
31. 


Correct two references from Corrected Error logout area to Machine Check logout area 
Change name of ‘instruction issue model’ to ‘initiate exception or interrupt model’ 


Swap order of data alignment trap and synchronous trap code fragments in initiate 
exception or interrupt model 


Correct which bits are loaded (=<4:0>) from R16 into IPR IPL by MTPR IPL 


Add REI* and CHMx to each entry along the main diagonal of the Processor State 
Transition table 


Describe machine check logout area as reserved for PALcode and console use 
Add R2..R7 to values restored by REI in ‘Stack Frames’ text 


Modify logic statement for Swap AST Enable so that it reflects CALL_PAL 
SWASTEN instruction action 


Remove references to ASTs as ‘interrupts’, substituting ‘exception’ where appropriate 


Move and modify reference to ASTs in last bullet item of section Exceptions to section 
Asynchronous System Trap ; 


Define meaning of ‘trigger instruction’ in text of arithmetic trap description 
Change values defined in R5 for memory management faults to full quadword values 


Modify initiate exception or interrupt model to show bit corresponding to software 
interrupt being dispatched to is cleared before the dispatch 


Modified tense of description of saved PC for arithmetic trap from ‘would have issued’ 
to ‘would have been issued’ 


Move power-fail text at end of section Interprocessor Interrupt Request Register to end 
of subtext of section Interrupts 


Clarify reference to ‘RA’ in initiate exception or interrupt pseudocode 


Change ‘vector < {exception ..}’ to ‘vector <— {exception or interrupt ..}’ in initiate 
exception or interrupt pseudocode 


Note that there are four per-mode SCB vectors for ASTs 
Add entry for Software Interrupts to table Exceptions and Interrupts Summary 


Restrict the class of instructions that are described as taking Invalid Operation traps on 
non-finite values 


Clarify that, following a memory management fault, R4 contains an address within the 
implementation-dependent-sized page that contains the faulting address 


Reorganize the sections on synchronous traps (starting around current section $$sec- 
tion(synchr_trap)) to eliminate references to ASTs under Other Synchronous Traps cat- 


egory 
Elaborate Interval Clock Interrupt description 
Changed Load and Store D to G in SCB entries table for Alignment Traps 


Moved ‘perf. monitor’ from Asynchronous Traps to Hardware Interrupts 


Revision 3.0, March 2, 1990 


1. 


Get PS/PC in correct order in stack frames 
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Restructure stack frames and R2..R7 
Increase stack frame alignment to 64 byte 
Restructure SCB 

Change some faults to synchronous traps 
Redo and simplify arithmetic traps 
Rework AST delivery to match VAX 


Specify writeback cache behavior at powerfail 


Mee I ae a ee. TS 


Remove IPL from Processor State Transition Table 


10. Remove Privileged instruction Trap 


Revision 2.0, October 4, 1989 


Delete operand faults 


1. Remove interrupt stack 
2. Remove kernel stack not valid abort 
3. Remove stack alignment requirement, add PS<SP_ALIGN> 
4. Remove ICIE and IPIE interrupt enables 
5. Remove FREEZE of PC 
6. Remove references to WAIT 
7. Add DRAINT and DRAINA 
8. 
9. 


Make data alignment fault stay in current mode 


10. Simplify floating exceptions 


Revision 1.0, May 23, 1989 


1. First review distribution\ 
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A 


Absolute longword queue, 2-21 





Absolute quadword queue, 2-24 


Access control violation (ACV) fault, 6-10 

has precedence, 3-13 

memory protection, 3-8 

service routine entry point, 6-27 
Address space match (ASM) 

bitin PTE, 3-5 

TBIAP register uses, 5-25 
Address space number (ASN) register, 5-4 

in HWPCB, 4-2 

privileged context, 2-91 

range supported, 3-12 

TBCHK register uses, 5-23 

TBIS register uses, 5-26 

translation buffer with, 3-11 
Address translation 

algorithm to perform, 3-9 

page frame number (PFN), 3-8 

page table structure, 3-8 

performance enhancements, 3-10 

translation buffer with, 3-11 

virtual address segment fields, 3-8 
Alignment 

data alignment trap, 6-15 

program counter (PC), 6-6 

stack, 6-31 

when data is unaligned, 6-28 


AMOVRM (PALcode) instruction, 2—75 
AMOVRR (PALcode) instruction, 2-75 
Arithmetic exceptions. See Arithmetic traps 


Arithmetic traps 
described, 6-12 
division by zero, 6—14 
F31 as destination, 6-12 
inexact result, 6-14 
integer overflow, 6-15 
invalid operation, 6-14 
overflow, 6-14 
program counter (PC) value, 6-14 
R31 as destination, 6-12 
recorded for software, 6-12 
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REI instruction with, 6-9 

service routine entry point, 6—27 

underflow, 6-14 

when concurrent with data alignment, 6-15 

when registers affected by, 6-13 
AST enable (ASTEN) register 

changing access modes in, 4—4 

described, 5-5 

in HWPCB, 4-2 

interrupt arbitration, 6-35 

operation (with ASTs), 4-4 

privileged context, 2-91 

SWASTEN instruction with, 2-19 
AST summary (ASTSR) register 

described, 5-7 

in HWPCB, 4-2 

indicates pending ASTs, 44 

interrupt arbitration, 6-35 

privileged context, 2-91 
Asynchronous system traps (AST) 

ASTEN/ASTSR registers with, 4—4 

initiating Process 

context switching the, 4—4 

interrupt, defined, 6-20 

service routine entry point, 6—27 

with PS register, 44 


Atomic move operations, 2-74 


Atomic operations 
modifying page table entry, 3-6 


BPT (PALcode) instruction, 2-4 


service routine entry point, 6—28 
trap information, 6-16 


Breakpoint exceptions, initiating, 2-4 





Bugcheck exception, initiating, 2-5 


BUGCHK (PALcode) instruction, 2-5 


service routine entry point, 6-28 
trap information, 6-16 


Byte_within_page field, 3-2 
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Cc 


Caches, flushing physical page from, 2-83 

CFLUSH (PALcode) instruction, 2-83 
with powerfail, 6-22 

Charged process cycles register, 2-91 
in HWPCB, 4-2 
PCC register and, 4-3 

CHME (PALcode) instruction, 2-6 
service routine entry point, 6-28 
trap initiation, 6-17 

CHMK (PALcode) instruction, 2-7 
service routine entry point, 6—28 
trap initiation, 6-17 

CHMS (PALcode) instruction, 2-8 
service routine entry point, 6-28 
trap initiation, 6-17 

CHMU (PALcode) instruction, 2-9 
service routine entry point, 6-28 
trap initiation, 6-17 

CLRFEN (PALcode) instruction, 2—10 

Context switching 
defined, 4-1 
hardware, 4—2 
initiating, 2-91 
raising IPL while, 44 
software, 4-2 
See also Hardware 

Corrected error interrupts, logout area for, 6-24 


CSERVE (PALcode) instruction, 2-84 
Current mode field, in PS register, 6-6 
Current PC, 6-2 


D 


Data alignment trap (DAT) register 
privileged context, 2-91 

Data alignment traps, 6-15 
fixup (DAT) bit, in HWPCB, 4-2 
fixup (DATFX) register, 5-9 
registers used, 6-15 . 
service routine entry point, 6-28 
when concurrent with arithmetic, 6-15 


Division by zero trap, 6-14 








DPC bit, machine check error summary register, 
5-14 

DSC bit, machine check error summary register, 
5-14 


DZE bii, exception summary parameter, 6-13 


m 





_ Exception service routines 


OpenVMS Alpha Software Index-2 


entry point, 6-26 
introduced, 6-8 
Exception summary parameter, 6-12 


Exceptional events 
actions, summarized, 6—2 
defined, 6-1 

Exceptions 
actions, summarized, 6—2 
initiated before interrupts, 6-18 
initiated by PALcode, 6-31 
introduced, 6-8 
processor state transitions, 6-37 
stack frames for, 6—7 
See also Arithmetic traps 


Executive read enable (ERE), bit in PTE, 3-5 


Executive stack pointer (ESP) register, 5-10 
as internal processor register, 5-1 
in HWPCB, 4-2 


Executive write enable (EWE), bit in PTE, 3-4 


F 


F_floating data type 
when data is unaligned, 6-28 
Fault on execute (FOE), 6-12 
bitin PTE, 3-6 
service routine entry point, 6-27 
software usage of, 6-12 
Fault on read (FOR), 6-11 
bitin PTE, 3-6 
service routine entry point, 6-27 
software usage of, 6-11 
Fault on write (FOW), 6-11 
bitin PTE, 3-6 
service routine entry point, 6—27 
software usage of, 6-11 
Faults, 6-9 
access control violation, 6-10 
defined, 6-9 
fault on execute, 6—12 
fault on read, 6-11 
fault on write, 6-11 
floating-point disabled, 6-10 
MM flag, 6-10 
program counter (PC) value, 6-9 
REI instruction with, 6-9 
translation not valid, 6—10 
Floating-point disabled fault, 6—10 
service routine entry point, 6-27 
Floating-point enable (FEN) register 
clearing, 2—10 
described, 5—ii 
in HWPCB, 4-2 
privileged context, 2-91 _ 
FOE. See Fault on execute 


FOR. See Fault on read 
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FOW. See Fault on write 


G 


G_floating data type 
when data is unaligned, 6-28 
GENTRAP (PALcode) instruction, 2-11 
trap information, 6-17 
Granularity hint (GH) 
bits in PTE, 3-5 





H 


Hardware interrupts 


interprocessor, 6-21 
interval clock, 6—20 
powerfail, 6-21 
Hardware nonprivileged context, 4-3 





Hardware privileged context, 4-2 
switching, 4—2 
Hardware privileged context block (HPCB) 
process unique value in, 2-79 
swapping ownership of, 2-91 
Hardware privileged context block (HWPCB) 
format, 4-2 
original built by HWRPB, 4-5 
PCBB register, 5—16 
specified by PCBB, 4—2 
writing to, 4-3 
Hardware restart parameter block (HWRPB) 
interval clock interrupt, 6-20 
logout area, 6—24 


HWPCEB. See Hardware privileged context block 
HWRPB. See Hardware restart parameter block 


I/O device interrupts, 6-20 





I/O devices, service routine entry points, 6-30 
Illegal instruction trap, 6-16 

service routine entry point, 6-28 
Illegal operand trap, service routine entry point, 

6-28 

Illegal PALcode operand trap, 6—16 
INE bit, exception summary parameter, 6-13 
Inexact result trap, 6-14 


Insert into queue PALcode instructions 
longword, 2-46 
longword at head interlocked, 2-30 
longword at head interlocked resident, 2-32 
longword at tail interlocked, 2-38 
longword at tail interlocked resident, 2—40 
quadword, 2-48 
quadword at head interlocked, 2-34 
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quadword at head interlocked resident, 2-36 
quadword at tail interlocked, 2—42 
quadword at tail interlocked resident, 2—44 


INSQHIL (PALcode) instruction, 2—30 
INSQHILR (PALcode) instruction, 2-32 
INSQHIQ (PALcode) instruction, 2—34 
INSQHIQR (PALcode) instruction, 2-36 
INSQTIL (PALcode) instruction, 2-38 
INSQTILR (PALcode) instruction, 2—40 
INSQTIQ (PALcode) instruction, 2-42 
INSQTIOR (PALcode) instruction, 2—44 
INSQUEL (PALcode) instruction, 2—46 
INSQUEL/D (PALcode) instruction, 2-46 
INSQUEQ (PALcode) instruction, 2—48 
INSQUEQ/D (PALcode) instruction, 2-48 


Instruction formats 
illegal trap, 6-16 
Integer overflow trap, 6-15 


Internal processor registers (IPR) 


address space number, 5—4 
AST enable, 5-5 
AST summary, 5-7 
CALL_PAL MFPR with, 5-1 
CALL_PAL MTPR with, 5-1 
data alignment trap fixup, 5—9 
defined, 1-1 
executive stack pointer, 5-10 
floating-point enable, 5—11 
interprocessor interrupt request, 5—12 
interrupt priority level, 5—13 
kemel mode with, 5-1 
machine check error summary, 5—14 
MFPR instruction with, 2-86 
MTPR instruction with, 2-87 
page table base, 5-18 
performance monitoring, 5—15 
privileged context block base, 5-16 
processor base, 5-17 
software interrupt request, 5-20 
software interrupt summary, 5—21 
summarized, 5-2 
supervisor stack pointer, 5-22 
system control block base, 5-19 
translation buffer check, 5-23. 
translation buffer invalidate all, 5—24 
translation buffer invalidate all process, 5-25 
translation buffer invalidate single, 5—26 
user stack pointer, 5-27 
virtual page base, 5—28 
Who-Am-I, 5-29 

Interprocessor interrupt, 6-21 
protocol for, 6-21 
service routine entry point, 6-29 


Interprocessor interrupt request (IPIR) register 


described, 5-12 
protocol for, 6-21 
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Interrupt pending (IP) field, in PS register, 6-6 
Interrupt priority level, 6-7 


Interrupt priority level (PL) 
events associated with, 6-18 
field in PS register, 6-6 
hardware levels, 6-7 
kernel mode software with, 6-18 
operation of, 6-17 
recording pending software (SISR register), 
5-21 
requesting software (SIRR register), 5-20 
service routine entry points, 6-29 
software interrupts, 6-19 
software levels, 6—7 
See also Interrupt priority level (IPL) register 


Interrupt priority level (IPL) register 
described, 5-13 
interrupt arbitration, 6-35 
See also Interrupt priority level (IPL) 
Interrupt service routines 
entry point, 6-26 
in each process, 6-18 
introduced, 6—17 
Interrupts 
actions, summarize, 6-2 
hardware arbitration, 6-34 
I/O device, 6-20 
initiated by PALcode, 6-31 
initiation, 6-18 
instruction completion, 6—17 
interprocessor, 6-21 
introduced, 6-17 
PALcode arbitration, 6—34 
passive release, 6—20 
powerfail, 6-21 
processor state transitions, 6-37 
program counter value, 6-2 
software, 6-19 
stack frames for, 6—7 
Interval clock interrupt, 6-20 
service routine entry point, 6-29 
INV bit 
exception summary parameter, 6-13 


Invalid operation trap, 6-14 
IOV bit 
exception summary parameter, 6-13 
IPL. See Interrupt priority level 
IPR. See Internal processor registers (IPR) 


IPR_KSP (internal processor register kernel stack 
pointer), 5-1 


K 


Kernel read enable (KRE) 

bit in PTE, 3-5 

with access control violation (ACV) fault, 3-13 
Kernel stack pointer (KSP) register 
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in HWPCB, 4-2 
Kernel stack, PALcode access to, 6-30 


Kernel write enable (KWE), bit in PTE, 3-4 


L 


LDF instruction, when data is unaligned, 6-28 





LDG instruction, when data is unaligned, 6-28 
LDL instruction, when data is unaligned, 6-28 
LDQ instruction, when data is unaligned, 6-28 
LDQ_L instruction, when data is unaligned, 6—28 
LDQP (PALcode) instruction, 2-85 

LDS instruction, when data is unaligned, 6-28 
LDT instruction, when data is unaligned, 6-28 
Load instructions, when data is unaligned, 6-28 
Logout area, 6-24 


M 


Machine check error summary (MCES) register 
described, 5-14 
using, 6-24 

Machine checks, 6—22 
actions, summarized, 6—2 
initiated by PALcode, 6-31 
logout area, 6-24 
masking, 6-23 
no disabling of, 6-23 
one per error, 6-24 
processor correctable, 6-23 
program counter (PC) value, 6-23 
REI instruction with, 6—23 
retry flag, 6-24 
service routine entry points, 6-29 
stack frames for, 6—7 
system correctable, 6-23 


Machine checks service routines, entry point, 6-26 
Masking, machine checks with, 6—23 


MCK bit, machine check error summary register, 
5-14 
Memory management 


address translation, 3-8 
always enabled, 3-3 

faults, 3-13, 6-10 

introduced, 3-1 

page frame number (PFN), 3-6 
page table entry (PTE), 3-4 
protection code, 3-7 
protection of individual pages, 3-7 
PTE modified by software, 3-6 
translation buffer with, 3-11 
unrecoverable error, 6-22 

with interrupts, 6-18 

with multiprocessors, 3-6 
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with process context, 4-1 
See also Address translation 


Memory management faults 


registers used, 6-10 
with unaligned data, 6-16 


MFPR_IPR_name (PALcode) instruction, 2—86 
MTPR_IPR_name (PALcode) instruction, 2—87 


Multiprocessor environment 


interprocessor interrupt, 6-21 
memory faults, 6-11 
memory management in, 3-6 
move operations in, 2-74 


Multithread implementation, 2-79 


N 


Next PC, 6-2 


O 


OpenVMS Alpha PALcode instructions (list), 2-1 
Overflow trap, 6-14 








OVF bit, exception summary parameter, 6-13 


p 


Page frame number (PFN) 
bits in PTE, 3-4 
determining validation, 3-6 
finding for SCB, 5-19 
PTBR register, 5-18 
with address translation, 3-8 
with hardware context switching, 4-3 
Page table base (PTBR) register, 5-18 
in HWPCB, 4-2 
privileged context, 2-91 
with address translation, 3-8 
Page table entry (PTE), 34 
after software changes, 3-11 
atomic modification of, 3-6 
modified by software, 3-6 
page protection, 3-7 
with multiprocessors, 3-6 
Pages 
collecting statistics on, 6-11 
individual protection of, 3-7 
max address size from, 3-3 
possible sizes for, 3-2 
virtual address space from, 3-2 


PALcode 


access to kernel stack, 6—30 

illegal operand trap, 6-16 

memory management requirements, 3-3 
OpenVMS Alpha, defined for, 2-1 
processor state transitions, 6-37 

queue data type support, 2—21 

See also Queues, support for 
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PALcode instructions 
OpenVMS Alpha (list), 2-1 
OpenVMS Alpha privileged (list), 2-82 
OpenVMS Alpha unprivileged (list), 2-3 
VAX compatibility, 2—74 
PALcode instructions, OpenVMS Alpha privileged 
cache flush, 2-83 
console service, 2—84 
load quadword physical, 2-85 
move from processor register, 2-86 
move to processor register, 2—87 
store quadword physical, 2-88 
swap PALcode image, 2-92 
swap privileged context, 2-89 
PALcode instructions, OpenVMS Alpha unprivileged 
breakpoint, 2—4 
bugcheck, 2-5 
change to executive mode, 2-6 
change to kernel mode, 2-7 
change to supervisor mode, 2-8 
change to user mode, 2-9 
Clear floating-point trap, 2—10 
generate software trap, 2-11 
insert into queue (list), 2—29 
probe for read access, 2-12 
probe for write access, 2—12 
read processor status, 2-14 
read system cycle counter, 2-17 
read unique context, 2-80 
return from exception or interrupt, 2-15 
swap AST enable, 2-19 
thread, 2-79 
write PS software field, 2-20 
write unique context, 2-81 


PALcode swapping, 2-92 
Passive release interrupts, 6-20 
entry point, 6-29 
PCE bit, machine check error summary register, 
5-14 
Performance monitor (PME) register 
privileged context, 2-91 
Performance monitor interrupt entry point, 6-29 
Performance monitoring enable (PME) bit 
in HWPCB, 4-2 
Performance monitoring register (PERFMON), 
5-15 
PFN. See Page frame number 
Physical address space, 3-3 
Physical address translation, 3-9 
PMI bus, uncorrected protocol errors, 6-22 
Powerfail interrupt, 6-21 
service routine entry point, 6—29 
Powerfail, CFLUSH PALcode instruction with, 
6-22 
Privileged context, 2-91: 
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Privileged context block base (PCBB) register, 5-16 
PROBER (PALcode) instruction, 2-12 

PROBEW (PALcode) instruction, 2-12 

Process, 4—1 

Processor base (PRBR) register, 5-17 


Processor cycle counter (PCC) register 
for OpenVMS Alpha, 1-2 
system cycle counter with, 2-17 
See also Charged process cycles 
Processor hardware interrupt, service routine entry 
points, 6-29 
Processor modes, 3-1 
AST pending state, 5—7 
change to executive, 2-6 
change to kernel, 2-7 
change to supervisor, 2-8 
change to user, 2-9 
controlling memory access, 3-7 
enabling executive mode reads, 3-5 
enabling executive mode writes, 3-4 
enabling kernel mode reads, 3-5 
enabling supervisor mode reads, 3—S 
enabling supervisor mode writes, 3—4 
enabling user mode reads, 3-4 
enabling user mode writes, 3-4 
page access with, 3-2 
PALcode state transitions, 6—37 


Processor stacks, 6—7 
Processor state transitions, 6-37 
Processor state, defined, 6—5 


Processor status (PS) register 
bit summary, 6—6 
bootstrap values in, 6-6 
current, 6—5 
defined, 1-1 
explicit reading/writing of, 6—5 
in processor state, 6-5 
saved on stack, 6-5 
saved on stack frame, 6—7 
WR_PS_SW instruction, 2-20 
Program counter (PC) register 
alignment, 6-6 
current PC defined, 6-2 
explicit reading of, 6-7 
in processor state, 6-5 
saved on stack frame, 6-7 
with arithmetic traps, 6~—14 
with faults, 6-9 
with interrupts, 6~2 
with machine checks, 6-23 
with synchronous traps, 6-15 
Protection code, 3-7 
Protection modes, 6—7 
PS<SP_ALIGN?> field, 2-14 


PTE. See Page table entry 
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Q 


Quadword data type 
loading in physical memory, 2-85 
storing to physical memory, 2-88 
Queues, support for 
absolute longword, 2-21 
absolute quadword, 2-24 
PALcode instructions (list), 2-29 
self-relative longword, 2-21 
self-relative quadword, 2-25 





R 


R31 

with arithmetic traps, 6-12 
RD_PS (PALcode) instruction, 2-14 
READ_UNQ (PALcode) instruction, 2~—80 
Register write mask, with arithmetic traps, 6-13 





Registers 
OpenVMS Alpha usage of, 1-1 
with IPRs, 5-1 

REI (PALcode) instruction, 2—15 
arithmetic traps, 6~9 
faults, 6-9 
interrupt arbitration, 6—36 
interrupts, 6—2 
machine checks, 6—23 
synchronous traps, 6—15 

Remove from queue PALcode instructions 
longword, 2-70 
longword at head interlocked, 2-50 
longword at head interlocked resident, 2-53 
longword at tail interlocked, 2-60 
longword at tail interlocked resident, 2-63 
quadword, 2-72 
quadword at head interlocked, 2-55 
quadword at head interlocked resident, 2-58 
quadword at tail interlocked, 2-65 
quadword at tail interlocked resident, 2-68 


REMQHIL (PALcode) instruction, 2—50 
REMQHILR (PALcode) instruction, 2—53 
REMQHIQ (PALcode) instruction, 2-55 
REMQHIQR (PALcode) instruction, 2-58 
REMOQTIL (PALcode) instruction, 2-60 
REMQTILR (PALcode) instruction, 2-63 
REMQTIQ (PALcode) instruction, 2-65 
REMQTIQR (PALcode) instruction, 2-68 
REMQUEL (PALcode) instruction, 2—70 
REMQUEL/D (PALcode) instruction, 2-70 
REMQUEQ (PALcode) instruction, 2—72 
REMQUEQ/D (PALcode) instruction, 2-72 


RPCC (read processor cycle counter) instruction 
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RSCC instruction with, 2-18 
RSCC (PALcode) instruction, 2—17 
RPCC instruction with, 2-18 


S 


S_floating data type, when data is unaligned, 6-28 





SCC. See System cycle counter 

SCE bit, machine check error summary register, 
5-14 

Self-relative longword queue, 2-21 

Self-relative quadword queue, 2-25 

Software (SW) field, in PS register, 6-6 

Software completion bit, exception summary register, 
6-13 

Software interrupt request (SIRR) register 


described, 5-20 
interrupt arbitration, 6-35, 6-36 
protocol for, 6-19 


Software interrupt summary (SISR) register 


described, 5-21 
protocol for, 6-19 


Software interrupts, 6-19 


asynchronous system traps (AST), 6—20 
protocol between summary and request, 6-19 
recording pending state of, 5-21 

request (SIRR) register, 6-19 

requesting, 5-20 

service routine entry points, 6-29 

summary (SISR) register, 6-19 

supported levels of, 5—20 


Software traps, generating, 2-11 
SP. See Stack pointer 
Stack alignment, 6-31 


Stack alignment (SP_ALIGN), field in saved PS, 
6-6 


Stack frames, 6—7 

Stack pointer (SP) register, defined, 1-1 

STF instruction, when data is unaligned, 6-28 
STG instruction, when data is unaligned, 6-28 
STL instruction, when data is unaligned, 6-28 
STL_C instruction, when data is unaligned, 6—28 
Store instructions, when data is unaligned, 6-28 
STQ instruction, when data is unaligned, 6-28 
STQ_C instruction, when data is unaligned, 6-28 
STQP (PALcode) instruction, 2-88 

STS instruction, when data is unaligned, 6-28 
STT instruction, when data is unaligned, 6-28 
Supervisor read enable (SRE), bit in PTE, 3-5 
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Supervisor stack pointer (SSP) register, 5-22 
as internal processor register, 5-1 
in HWPCB, 4-2 


Supervisor write enable (SWE), bitin PTE, 3-4 


SWASTEN (PALcode) instruction, 2-19 
interrupt arbitration, 6-36 
with ASTEN register, 5-6 
SWC bit 
exception summary parameter, 6-13 
SWPCTX (PALcode) instruction, 2-89 
with ASTSR register, 5-8 
SWPPAL (PALcode) instruction, 2—92 


Synchronous traps, 6-9 


data alignment, 6-15 

defined, 6-9 

program counter (PC) value, 6-15 
REI instruction with, 6-15 


System control block (SCB) 
arithmetic trap entry points, 6—27 
fault entry points, 6-27 
finding PFN, 5-19 
saved on stack frame, 6-7 . 
structure of, 6-26 
with memory management faults, 3-13 
System control block base (SCBB) register, 5-19 
specifies PFN, 6-26 
System cycle counter (SCC) register 
reading, 2-17 


T 


T_floating data type, when data is unaligned, 6-28 
TB. See Translation buffer 


Translation buffer (TB) 
address space number with, 3-11 
fault on execute, 6-12 
fault on read, 6-11 
fault on write, 6-11 
granularity hint in PTE, 3-5 
with invalid PTEs, 3-12 
Translation buffer check (TBCHK) register 
described, 5-23 
with translation buffer, 3-12 
Translation buffer invalidate all (TBIA) register 
described, 5-24 
with translation buffer, 3-12 
Translation buffer invalidate all process (TBIAP) 
register 
described, 5-25 
with translation buffer, 3-12 
Translation buffer invalidate single (TBIS) register, 
5-26 
Translation not valid fault, 6-10 
service routine entry point, 6-27 
Traps. See Arithmetic traps 
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U 


Underflow trap, 6-14 
UNF bit, exception summary parameter, 6-13 
User read enable (URE), bit in PTE, 3-4 


User stack pointer (USP) register, 5-27 
in HWPCB, 4-2 
internal processor register, 5-1 
User write enable (UWE), bit in PTE, 3-4 


V 


Valid (V), bitin PTE, 3-6 


Virtual address format, 3-2 








Virtual address space, 3-1, 3-2 
minimum and maximum, 3-2 
page size with, 3-2 

Virtual address translation, 3-10 


Virtual machine monitor (VMM), bit in PS register, 
6-6 


Virtual page table base (VPTB) register, 5—28 


W 


Watchpoints 


with fault on read, 6-11 
with fault on write, 6—11 


Who-Am-I (WHAMI) register Processor number, 
reading, 5-29 


WR_PS_SW (PALcode) instruction, 2-20 
WRITE_UNQ (PALcode) instruction, 2-81 
WTINT (PALcode) instruction, 2—94 
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DIGITAL UNIX Software (II-B) 


This section describes how the DIGITAL UNIX operating system relates to the Alpha architec- 
ture, and includes the following chapters: 


¢ Chapter 1, Introduction to DIGITAL UNIX (II-B) 
¢ Chapter 2, PALcode Instruction Descriptions (II-B) 
¢ Chapter 3, Memory Management (II-B) 

e Chapter 4, Process Structure (II-B) 


¢ Chapter 5, Exceptions and Interrupts (II-B) 
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Introduction to DIGITAL UNIX (II-B) 
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PALcode Instruction Descriptions (II-B) 
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2.2.8 Return from Trap, Fault or Interrupt «6444 6 ss oso scde bx6 Bee esas ee OES 
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2.2.10 WAPI LY. so Anat poe acento pee eae ale EOS ae wale sae edte eS Gants 
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2.215 Write Floating-Point: Bnable: 2405004 von bd bnew ieee Soe RG apne ed 
2.2.16 Write Interprocessor Interrupt request. ... 2.0... eee eee eee 
2217 Write Kermel Global Pointer 10..5.322044 5.000 ete fag eboe oes Cee ae aes 
2.2.18 Write Machine Check Error Summary ............. 0.00 c cence ee eee 
2.2.19 Performance Monitoring Function. ......... 00.0... e cece eee eee eee 
2.2.20 Write User Stack: Pomter 3 ot jaut cee vaca siecle guiateedun ys 
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Memory Management (II-B) 
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Chapter 1 


Introduction to DIGITAL UNIX (II-B) 


The goals of this design are to provide a hardware interface between the hardware and 
DIGITAL UNIX that is implementation independent. The interface needs to provide the 
required abstractions to minimize the impact of different hardware implementations on the 
operating system. The interface also needs to be low in overhead to support high-performance 
systems. Finally, the interface needs to support only the features used by DIGITAL UNIX. 


The register usage in this interface is based on the current calling standard used by DIGITAL 


UNIX. If the calling standard changes, this interface will be changed accordingly. The current 
calling standard register usage is shown in Table 1-1. 


Table 1-1: DIGITAL UNIX Register Usage 


Register Software 


Naive Nate Use and Linkage 

r0 vO Used for expression evaluations and to hold integer function 
results. 

rl...r8 t0...t7 Temporary registers; not preserved across procedure calls. 

r9...r14 s0...s5 Saved registers; their values must be preserved across proce- 
dure calls. 

r15 FP or s6 Frame pointer or a saved register. 

r16...121 a0...a5 Argument registers; used to pass the first six integer type argu- 
ments; their values are not preserved across procedure calls. 

122335125 t8...tl1 Temporary registers; not preserved across procedure calls. 

126 ra Contains the return address; used for expression evaluation. 

127 pv or t12 Procedure value or a temporary register. 

128 at Assembler temporary register; not preserved across procedure 
calls. 

129 GP Global pointer. 

130 ~ SP Stack pointer. 

r31 ZeTO Always has the value 0. 
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1.1 Programming Model 


The programming model of the machine is the combination of the state visible either directly 
via instructions, or indirectly via actions of the machine. Tables 1-2 and 1-3 and define code 
flow constants, state variables, terms, subroutines, and code flow terms that are used in the rest 


of the document. 


1.1.1 Code Flow Constants and Terms 


DIGITAL UNIX uses the following constants and terms 


Table 1-2: Code Flow Constants and Terms 


Term Meaning and value 


IPL =2:0 The range 2:0 used in the PS to access the IPL field of the PS (PS <IPL>). 


maxCPU The maximum number of processors in a given system. 


mode = 3 Used as a subscript in PS to select current mode (PS <mode>). 


opDec An attempt was made to execute a reserved instruction or execute a privileged 
instruction in user mode. . 


pageSize Size of a page in an implementation in bytes. 


vaSize Size of virtual address in bits in a given implementation. 


1.1.2 Machine State Terms 
Table 1—3: Machine State Terms 


Term 


ASN 


entArith <63:0> 


entIF <63:0> 


Meaning 


An implementation-dependent size register to hold the current address 
space number (ASN). The size and existence of ASN is an implemen- 
tation choice. 

The arithmetic trap entry address register. The entArith is an internal 
processor register that holds the dispatch address on an arithmetic trap. 
There can be a hardware register for the entArith or the PALcode can 
use private scratch memory. 


The instruction fault or synchronous trap entry address register. The 
entIF is an internal processor register that holds the dispatch address 
on an instruction fault or synchronous trap. There can be a hardware 
register for the entIF or the PALcode can use private scratch memory. 


The interrupt entry address register. The entInt is an internal processor 
register that holds the dispatch address on an interrupt. There can be a 
hardware register for the entInt or the PALcode can use private scratch 


memory. 
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Table 1-3: Machine State Terms (Continued) 


Term 


entMM <63:0> 


entSys <63:0> 


entUna <63:0> 


FEN <0> 


instruction <31:0> 


intr_flag 


KGP <63:0> 


KSP <63:0> 


lock_flag <0> 


MCES <2:0> 


PC <63:0> 


PCB 
PCBB <63:0> 


Meaning 


The memory-management fault entry address register. The entMM is 
an internal processor register that holds the dispatch address on a 
memory-management fault. There can be a hardware register for the 
entMM or the PALcode can use private scratch memory. 


The system call entry address register. The entSys is an internal pro- 
cessor register that holds the dispatch address on an callsys instruc- 
tion. There can be a hardware register for the entSys or the PALcode 
can use private scratch memory. 


The unaligned fault entry address register. The entUna is an internal 
processor register that holds the dispatch address on an unaligned 
fault. There can be a hardware register for the entUna or the PALcode 
can use private scratch memory. 


The floating-point enable register. The FEN is a one-bit register, 
located at bit 0 of PCB[40], that is used to enable or disable float- 
ing-point instructions. If a floating-point instruction is executed with 
FEN equal to zero, a FEN fault is initiated. 


The current instruction being executed. This is a fake register used in 
the flows to CASE on different instructions. 


A per-processor state bit. The intr_flag bit is cleared if that processor 
executes an rti or retsys instruction. 


The kernel global pointer. The KGP is an internal processor register 
that holds the kernel global pointer that is loaded into R15, the GP, 
when an exception is initiated. There can be a hardware register for the 
KGP or the PALcode can use private scratch memory. 


The kernel stack pointer. The KSP is an internal processor register that 
holds the kernel stack pointer while in user mode. There can be a hard- 
ware register for the KSP or the storage space in the PCB can be used. 


A one-bit register that is used by the load locked and store conditional 
instructions. 


The machine check error summary register. The MCES is a 3-bit reg- 
ister that contains controls for machine check and system-correctable 
error handling. 


The program counter. The PC is a pointer to the next instruction in the 
flows. The low-order two bits of the PC always read as zero and writes 
to them are ignored. 


The process control block. The PCB holds the state of the process. 


The process control block base address register. The PCBB holds the 
address of the PCB for the current process. 
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Table 1-3: Machine State Terms (Continued) 


Term Meaning 


PCC The PCC register consists of two 32-bit fields. The low-order 32 bits 
(PCC <31:0>) are an unsigned, wrapping counter, PCC_CNT. The 
high-order 32 bits (PCC <63:32>) are an offset, PCC_OFF. PCC_OFF 
is a value that, when added to PCC_CNT, gives the total PCC register 
count for this process, modulo 2**32. 


PME <62> The performance monitoring enable bit. The PME is a one-bit register, 
located at bit 62 of PCB[40], that alerts any performance monitoring 
software/hardware in the system that this process is to have its perfor- 
mance monitored. The implementation mechanism for this bit is not 
specified; it is implementation dependent (IMP). 


PS <3:0> The processor status. The PS is a four-bit register that stores the cur- 
. rent mode in bit.<3> and stores the three-bit IPL in bits <2:0>. The 
mode is O for kernel and 1 for user. 


PTBR <63:0> The page table base register. The PTBR contains the physical page 
frame number (PFN) of the highest level page table. 


SP <63:0> Another name for R30. The SP points to the top of the current stack. 


PALcode only accesses the kernel stack. The kernel stack must be 
quadword aligned whenever PALcode reads or writes it. If the PAL- 
code accesses the kernel stack and the stack is not aligned, a ker- 
nel-stack-not-valid halt is initiated. Although PALcode does not 
access the user stack, that stack should also be at least quadword 
aligned for best performance. 


sysvalue <63:0> The system value register. The sysvalue holds the per-processor 
unique value. There can be a hardware register for the sysvalue regis- 
ter or the storage space in the PALcode scratch memory can be used. 


The sysvalue register can only be accessed by kernel mode code and 


NO atrat lian waniactase nA 
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unique <63:0> The process unique value register. The unique register holds the 


per-process unique value. There can be a hardware register for the 
unique register or the storage space in the PCB can be used. 


The unique register can be accessed by both user and kernel code and 
there is one unique register per process. 


USP <63:0> The user stack pointer. The USP is an internal processor register that 
holds the user stack pointer while in kernel mode. There can be a hard- 
ware register for the USP or the storage space in the PCB can be used. 


VPTPTR <63:0> The virtual page table pointer. The VPTPTR holds the virtual address 
of the first level page table. 


whami <63:0> The processor number of the current processor. This number is in the 
range 0...maxCPU-1. 
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1.2 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 
2. DEC OSF/1 —> Digital UNIX 


Revision 6.0, December 1994 
1. OSKF/1——> Digital UNIX 
2. Alpha ——> Alpha AXP 
3. Added MCES, PME, and PCC to machine state 


Revision 1.0, May 12, 1992 


1. First review distribution\ 
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Chapter 2 


PAL code Instruction Descriptions (II-B) 


2.1 Unprivileged PALcode Instructions 


Table 2-1 lists the DIGITAL UNIX PALcode unprivileged instruction mnemonics, names, 
and the environment from which they can be called. 


Table 2-1: Unprivileged PALcode Instructions 


Mnemonic 
bpt 

bugchk 
callsys 
clrfen 
gentrap 


imb 


rdunique 
urti 


wrunique - 


Name 


Breakpoint trap 

Bugcheck trap 

System call 

Clear floating-point enable 
Generate trap 


I-stream memory barrier 


Read unique 
Return from user mode trap 


Write unique 
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Calling environment 


Kernel and user modes 
Kernel and user modes 
User mode 
User mode 
Kernel and user modes 


Kernel and user modes 
Described in Common Architecture, Chap- 
ter 6. 


Kernel and user modes 
User mode 


Kernel and user modes 
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2.1.1 Breakpoint Trap 
Format: 


bpt ! PALcode format 


Operation: 
temp <— PS 
if (ps<mode> NE 0) then 
USP < SP ! Mode is user so switch to kernel 
SP << KSP 
PS < 0 
endif 
SP — SP - {6 * 8} 
(SP+00) < temp 
(SP+08) < PC 
(SP+16) < GP 
(SP+24) < a0 
(SP+32) < al 
(SP+40) < a2 
ad < 0 
GP «< KGP 
PC < entIF 


Exceptions: 


Kernel stack not valid 


Instruction Mnemonics: 


bpt Breakpoint trap 


Description: 


The breakpoint trap (bpt) instruction switches mode to kernel, builds a stackframe on the ker- 
nel stack, loads the GP with the KGP, loads a value of 0 into a0, and dispatches to the 
breakpoint code pointed to by the entIF register. The registers al...a2 are UNPREDICT- 
ABLE on entry to the trap handler. The saved PC at (SP+08) is the address of the instruction 
following the trap instruction that caused the trap. 
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2.1.2 Bugcheck Trap 


Format: 


bugchk ! PALcode format 


Operation: 


temp < PS 

if (PS<mode> NE 0) then 
USP < SP ! Mode is user so switch to kernel 
SP <¢ KSP 
PS ¢ 0 

endif 

SP < SP - {6 * 8} 

(SP+00) temp 

(SP+08) PC 

(SP+16) GP 

(SP+24) ad 

(SP+32 ) al 

(SP+40) a2 

adO< 1 

GP « KGP 

PC < entIF 


te ee 


Exceptions: 


Kernel stack not valid 


Instruction Mnemonics: 


bugchk Bugcheck trap 


Description: 


The bugcheck trap (bugchk) instruction switches mode to kernel, builds a stackframe on the 
kernel stack, loads the GP with the KGP, loads a value of 1 into a0, and dispatches to the 
breakpoint code pointed to by the entIF register. The registers al...a2 are UNPREDICT- 
ABLE on entry to the trap handler. The saved PC at (SP+08) is the address of the instruction 
following the trap instruction that caused the trap. 
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2.1.3 System Call 


Format: 


callsys ! PALcode format) 


Operation: 


if (PS<mode> EQ 0) then 
machineCheck 
endif 
USP < SP 
SP <¢ KSP 
PS < 0 ! Mode=kernel 
SP <¢ SP - {6*8} 
(SP+00) < 8 ! PS of mode=user, IPL=0 © 
(SP+08) < PC 
(SP+08) < GP 
GP < KGP 
PC < entsys 


Exceptions: 


Machine check — invalid kernel mode callsys 
Kernel stack not valid 


Instruction Mnemonics: 


callsys System call 


Description: 


The system call (callsys) instruction is supported only from user mode. (Issuing a callsys from 
kernel mode causes a machine check exception.) 


The callsys instruction switches mode to kernel and builds a callsys stack frame. The GP is 
loaded with the KGP. The exception then dispatches to the system call code pointed to by the 


entSys register. On entry to the callsys code, the scratch registers tO and t8...tl1 are 
UNPREDICTABLE. 
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2.1.4 Clear Floating-Point Enable 


Format: 
clrfen ! PALcode format 


Operation: 


FEN < 0 
(PCBB+40)<0> < 0 


Exceptions: 


None 


Instruction Mnemonics: 


clrfen Clear floating-point enable 


Description: 


The clear floating-point enable (clrfen) instruction writes a zero to the floating-point enable 
register and to the PCB at offset (PCBB+40)<0>. On return from the clrfen instruction, the 
scratch registers tO and t8...t11 are UNPREDICTABLE. 
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2.1.5 Generate Trap 
Format: 


gentrap ! PALcode format 


Operation: 


temp < PS 

if (PS<mode> NE 0) then 
USP < SP ! Mode is user so switch to kernel 
SP. «< KSP 
PS < 0 

endif 

SP — SP - {6 * 8} 

(SP+00) temp 

(SP+08 ) PC 

(SP+16) GP 

(SP+24) ad 

(SP+32) al 

(SP+40) a2 

a0 ¢ 2 

GP < KGP 

PC <— entIF 


ate at 


Exceptions: 


Kernel stack not valid 


Instruction Mnemonics: 


gentrap Generate trap 


Description: 


The generate trap (gentrap) instruction switches mode to kernel, builds a stackframe on the 
kernel stack, loads the GP with the KGP, loads a value of 2 into a0, and dispatches to the 
breakpoint code pointed to by the entIF register. The registers al...a2 are UNPREDICT- 
ABLE on entry to the trap handler. The saved PC at (SP+08) is the address of the instruction 
following the trap instruction that caused the trap. 
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2.1.6 Read Unique Value 


Format: 


rdunique ! PALcode format 


Operation: 
v0 < unique 
Exceptions: 


None 


Instruction Mnemonics: 


rdunique Read unique value 


Description: 


The read unique value (rdunique) instruction returns the process unique value in vO. The 
write unique value (wrunique) instruction, described in Section 2.1.8, sets the process unique 
value register. 
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2.1.7 Return from User Mode Trap 


Format: 
urti ! PALcode format 
Operation: 
if (PS<mode> EQ 0) then 
{machineCheck} 
endif 


if (SP<5:0> NE 0) 

{Initiate illegal operand exception} 
endif 
tempps < (SP+16) 


if (( tempps<mode> EQ 0 ) OR ( tempps<IPL> NE 0 )) then 
{Initiate illegal operand exception} 


endif 
at < (SP+0) 
tempsp < (SP+8) 
temppe <- (SP+24) 
GP < (SP+32) 
ad < (SP+40) 
al < (SP+48) 
a2 « (SP+56) 
intr flag = 0 ! Clear the interrupt flag 
lock flag = 0 ! Clear the load lock flag 
oe «- tempsp 
PC < temppc 
Exceptions: 


Machine check - invalid kernel mode urti 
Illegal operand | | 
Translation not valid 

Access violation 

Fault on read 


Instruction Mnemonics: 


urti Return from user mode trap 


Description: 


The return from user trap (urti) instruction pops registers (aQ...a2, and GP), the new user at, 
SP, PC, and the PS, from the user stack. 
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2.1.8 Write Unique Value 


Format: 
wrunique ! PALcode format 


Operation: 
unique < a0 
Exceptions: 


None 


Instruction Mnemonics: 


wrunique _ Write unique value 


Description: 


The write unique value (wrunique) instruction sets the process unique register to the value 
passed in a0. The read unique value (rdunique) instruction, described in Section 2.1.6, returns 
the process unique value. 
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2.2 Privileged PALcode Instructions 


The Privileged DIGITAL UNIX PALcode instructions (Table 2—2) provide an abstracted inter- 


face to control the privileged state of the machine. 


Table 2-2: Privileged PALcode Instructions 


Mnemonic Name 

cflush Cache flush 

cserve Console service 

draina Drain aborts. Described in Common Architecture, Chapter 6. 
halt Halt the processor. Described in Common Architecture, Chapter 6. 
rdmces Read machine check error summary register 
rdps Read processor status 

rdusp Read user stack pointer 

rdval - Read system value 

retsys Return from system call 

rti Return from trap, fault, or interrupt 

swpctx Swap process context 

swppal Swap PALcode image 

swpipl Swap IPL 

tbi TB (translation buffer) invalidate 

whami Who am I 

wrent Write system entry address 

wrfen Write floating-point enable 

Wiipir Write interprocessor interrupt request 

wrkgp Write kernal global pointer 

wrmces Write machine check error summary register 
wrperfmon Performance monitoring function 

wrusp Write user stack pointer 

wrval Write system value 

wrvptptr Write virtual page table pointer 

wtint Wait for interrupt 
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2.2.1 Cache Flush 


Format: 


cflush 'PALcode format 


Operation: 


! a0 contains the page frame number (PFN) 
! of the page to be flushed 


IF PS<mode> EQ 1 THEN 
{Initiate opDec fault} 


{Flush page out of cache(s)} 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


cflush Cache flush 


Description: 


The cflush instruction may be used to flush an entire physical page specified by the PFN in a0 
from any data caches associated with the current processor. All processors must implement 
this instruction. 


On processors that implement a backup power option that maintains only the contents of mem- 
ory if a powerfail occurs, this instruction is used by the powerfail interrupt handler to force 
data written by the handler to the battery backed-up main memory. After a cflush, the first sub- 
sequent load (on the same processor) to an arbitrary address in the target page is either fetched 
from physical memory or from the data cache of another processor. | 


In some multiprocessor systems, cflush is not sufficient to ensure that the data are actually 
written to memory and not exchanged between processor caches. Additional platform-specific 
cooperation between the powerfail interrupt handlers executing on each processor may be 
required. 


On systems that implement other backup power options (including none), cflush may return 
without affecting the data cache contents. \On systems that implement some form of NVRAM, 
PALcode must ensure that the given page is in memory before returning to the caller, regard- 
less of the backup power option (including none).\ 


To order cflush properly with respect to preceding writes, an MB instruction is needed before 
the cflush; to order cflush properly with respect to subsequent reads, an MB instruction is 
needed after the cflush. 
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2.2.2 Console Service 


Format: 


cserve !PALcode format 


Operation: 


! implementation specific 


if PS<mode> EQ 1 then 
{initiate opDec fault} 


else 
{implementation-dependent action} 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


cserve Console service 


Description: 


This instruction is specific to each PALcode and console implementation and is not intended 
for operating system use. 


\The console can implement the generic console I/O routines by using the cserve instruction to 
transition to and from console I/O mode. The cserve instruction is primarily used in the 
generic console I/O callback routines for virtual-to-physical address translation. Since the 
PALcode image used by the operating system can differ from that used by the console, the 
console might not have direct knowiedge of the active memory management policy. There- 
fore, the console uses cserve to get the physical address.\ 
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2.2.3 Read Machine Check Error Summary 


Format: 


rdmces ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 < MCES 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


rdmces Read machine check error summary 


Description: 


The read machine check error summary (rdmces) instruction returns the MCES (machine 
check error summary) register in v0. On return from the rdmces instruction, registers tO and 
t8...tl1 are UNPREDICTABLE. 
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2.2.4 Read Processor Status 


Format: 


rdps ! PALcode format 


Operation: 
if (PS<mode> EQ 1) then 
{Initiate opDec fault} 
endif 
vO < PS 
Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


_tdps Read processor status 


Description: 


The read processor status (rdps) instruction returns the PS in vO. On return from the rdps 
instruction, registers t0 and t8...t11 are UNPREDICTABLE. 
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2.2.5 Read User Stack Pointer 


Format: 
rdusp ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 < USP 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


rdusp Read user stack pointer 


Description: 


The read user stack pointer (rdusp) instruction returns the user stack pointer in vO. The user 
stack pointer is written by the wrusp instruction, described in Section 2.2.20. On return from 
the rdusp instruction, registers tO and t8...t11 are UNPREDICTABLE. 
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2.2.6 Read System Value 


Format: 
rdval 'PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 <« sysvalue 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


rdval Read system value 


Description: 


The read system value (rdval) instruction returns the sysvalue in vO, allowing access toa 
64-bit per-processor value for use by the operating system. On return from the rdval instruc- 
tion, registers t0 and t8...t11 are UNPREDICTABLE. 
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2.2.7 Return from System Call 


Format: 
retsys ! PALcode format 


Operation: 


if {PS<mode> EQ 1} then 
{Initiate opDec fault} 

endif 

tmp < (SP+08) 

GP <¢ (SP+16) 

KSP < SP + {6*8} 


SP << USP 
intr_flag = 0 ! Clear the interrupt flag 
lock_flag = 0 ! Clear the load lock flag 
PS < 8 ! Mode=user 
PC < tmp 

Exceptions: 


Opcode reserved to DIGITAL 
Kernel stack not valid (halt) 
Instruction Mnemonics: 


retsys Return from system call 


Description: 


The return from system call (retsys) instruction pops the return address and the user mode glo- 
bal pointer from the kernel stack. It then saves the kernel stack pointer, sets the mode to user, 
sets the IPL to zero, and enters the user mode code at the address popped off the stack. On 
return from the retsys instruction, registers tO and t8...t11 are UNPREDICTABLE. 
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2.2.8 Return from Trap, Fault or Interrupt 


Format: 
rti | | PALcode format 


Operation: 
if (PS<mode> EQ 1) then 
{Initiate opDec fault} 
endif 
tempps <- (SP+0) 
temppc <- (SP+8) 
GP < (SP+16) 
aQ <« (SP+24) 
al < (SP+32) 
a2 < (SP+40) 
SP — SP + {6 * 8} 
if { tempps<3> EQ 1} then 


KSP < SP ! New mode is user 
SP < USP 
tempps < 8 
endif 
intr flag = 0 Clear the interrupt flag 


lock_flag = 0 
PS < tempps<3:0> 
PC < temppc 


Clear the load lock flag 
Set new PS 


Exceptions: 


Opcode reserved to DIGITAL 
Kernel stack not valid (halt) 


Instruction Mnemonics: 


rti Return from trap, fault, or interrupt 


Description: 


The return from fault, trap, or interrupt (rti) instruction pops registers (a0...a2, and GP), the 
PC, and the PS, from the kernel stack. If the new mode is user, the kernel stack is saved and 
the user stack is restored. 
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2.2.9 Swap Process Context 


Format: 


swpctx ! PALcode format 


Operation: 


if (PS<mode> EQ 1) 
{Initiate opDec fault} 
endif 
(PCBB) < SP ! Save current state 
(PCBB+8) < USP . 
tmp < PCC 
tmpl <— tmp<31:0> + tmp<63:32> 
(PCBB+24)<31:0> <— tmpl1<31:0> 


v0 < PCBB ! Return old PCBB 
PCBB < a0 ! Switch PCBB 
SP < (PCBB) ! Restore new state 


USP < (PCBB+8) 
oldPTBR <- PTBR 
PTBR <- (PCBB+16) 
tmp1 < (PCBB+24) 
PCC<63:32> < {tmpl — tmp}<31:0> 
FEN < (PCBB+40) 
if {process unique register implemented} then 
(v0+32) < unique 
unique < (PCBBt32) 
endif 
if {ASN implemented} 
ASN < tmp1<63:32> 


else 
if (oldPTBR NE PTBR) 
{Invalidate all TB entries with ASM=0} 
endif 
endif 
Exceptions: 


Opcode reserved to DIGITAL 
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Instruction Mnemonics: 


swpctx Swap process context 


Description: 

The swap process context (swpctx) instruction saves the current process data in the current 
PCB. Then swpctx switches to the PCB passed in a0 and loads the new process context. The 
old PCBB is returned in vO. 

The process context and the PCB are described in Chapter 4. 


On return from the swpctx instruction, registers t0, t8...t11, and a0 are UNPREDICTABLE. 


DIGITAL Restricted Distribution 
2-20 DIGITAL UNIX Software (II-B) 


2.2.10 Swap IPL 


Format: 


swpipl | ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 < PS<IPL> 

PS<IPL> < a0<2:0> 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


swpipl Swap IPL 


Description: 


The swap IPL (swpipl) instruction returns the current value of the PS<IPL> bits in vO and sets 
the IPL to the value passed in a0. On return from the spwip]l instruction, registers tO, t8...t11, 
and a0 are UNPREDICTABLE. 
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2.2.11 Swap PALcode Image 


Format: 
swppal 


Operation: 
a0 contains the new PALcode identifier 


al:a5 contain implementation-specific entry parameters 
v0 receives the following status: 


! 
e 
! 
e 
! 
. 
! 
. 
! 
e 
! 
ry 


if 


else 
if {a0 < 256} then 


'PALcode format 


0 success (PALCode was switched) 
1 unknown PALcode variant 
2 known PALcode variant, but PALcode not loaded 


(PS<mode> EQ 1) then 


(Initiate opDec fault) 


begin 
if {a0 invalid} then 
v0 ¢ 1 
{return} 
else if {PALcode not loaded} then 
v0 < 2 
{return} 
else 
tmpl «< {PALcode base} 
end 
else 
tmpl = a0 


{flush instruction cache} 
{invalidate all translation buffers} 
{perform additional PALcode variant-specific initialization} 


a. a 


wanafawnr nAnbennr ane om sol ome ent ane as | 
{transfer control to PALcode entry at physical address in tipi} 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


swppal | Swap PALcode image 
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Description: 


The swap Palcode image (swppal) instruction causes the current (active) PALcode to be 
replaced by the specified new PALcode image. The swppal instruction is intended for use by 
operating systems only during bootstraps and by consoles during transitions to console I/O 
mode. 


The PALcode descriptor contained in a0 is interpreted as either a PALcode variant or the base 
physical address of the new PALcode image. If a variant, the PALcode image must have been 
loaded previously. No PALcode loading occurs as a result of this instruction. 


After successful PALcode switching, the register contents are determined by the parameters 
passed in al...a5 or are UNPREDICTABLE. A common parameter is the address of a new 
PCB. In this case, the stack pointer register and PTBR are determined by the contents of that 
PCB; the contents of other registers such as a0...a5 may be UNPREDICTABLE. 


See Console Interface Architecture, for information on using this instruction. 
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2.2.12 TB Invalidate 


Format: 


tbi ! PALcode format 


Operation: 
if (PS<mode> EQ 1) then 
{Initiate opDec fault} 
endif 
case a0 begin 
1: ! tbisi 
{Invalidate ITB entry for va=al} 
break; 
2: ! tbisd 
{Invalidate DTB entry for va=al} 
break; 
3: ! tbis 
{Invalidate both ITB and DTB entry for va=al} 
break; 
-1: ! tbiap 
{Invalidate all TB entries with ASM=0} 
break; 
-~2: ! tbhia | 
{Flush all TBs} 
break; 
otherwise: 
break; 
endcase 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


tbi TB (translation buffer) invalidate 


Description: 


The TB invalidate (tbi) instruction removes specified entries from the I and D translation buff- 
ers (TBs) when the mapping changes. The tbi instruction removes specific entry types based 
on a CASE selection of the value passed in register a0. On return from the tbi instruction, reg- 
isters tO, t8...t11, a0, and al are UNPREDICTABLE. 
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2.2.13 Who Am I 
Format: 


whami ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 < whami 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


whami Who am I 


Description: 


The who am I (whami) instruction returns the processor number for the current processor in 
vO. The processor number is in the range 0 to the number of processors minus one (0...max- 
CPU-1) that can be configured in the system. On return from the whami instruction, registers 
tO and t8...t11 are UNPREDICTABLE. 
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2.2.14 Write System Entry Address 
Format: | 


wrent ! PALcode format 


Operation: 
if (PS<mode> EQ 1) then 
{Initiate opDec fault} 
endif 
case al begin 

0: ! Write the EntInt: 
entInt < a0 
break; 

1: ! Write the EntArith: 
entArith < a0 
break; 

2: $! Write the EntMM: 
entMM < a0 
break; 

3: ! Write the EntIF: 
entIF < a0 
break; 

4: ! Write the EntUna: 
entUna < a0 
break; 

5: ! Write the EntSys: 
entSys < a0 


break; 
otherwise: 
break; 
endcase; 
Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrent Write system entry address 


Description: 


The write system entry address (wrent) instruction determines the specific system entry point 
based on a CASE selection of the value passed in register al. The wrent instruction then sets 


; eco a onacified ovetem entry noint to the valie nasced in af 
the virtual address of the specified system entry point to the value passed in a0. 


-For best performance, all the addresses should be kseg addresses. (See Chapter 3 for a defini- 
tion of kseg addresses.) On return from the wrent instruction, registers t0, t8...t11, a0, and al 
are UNPREDICTABLE. | 
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2.2.15 Write Floating-Point Enable 


Format: 


wrfen ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

FEN <— a0<0> 

(PCBB+40)<0> < a0 AND 1 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrfen , Write floating-point enable 


Description: 


The write floating-point enable (wrfen) instruction writes bit zero of the value passed in a0 to 
the floating-point enable register. The wrfen instruction also writes the value for FEN to the 
PCB at offset (PCBB+40)<0>. On return from the wrfen instruction, registers tO, t8...t11, and 
aQ are UNPREDICTABLE. 
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2.2.16 Write Interprocessor Interrupt request 
Format: 


wripir ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

IPIR < a0 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wripir Write interprocessor interrupt request 


Description: 


The write interprocessor interrupt request (wripir) instruction generates an interprocessor 
interrupt on the processor number passed in register a0. The interrupt request is recorded on 
the target processor and is initiated when the proper enabling conditions are present. On 
return from wripir, registers tO, t8...t11, and a0 are UNPREDICTABLE. 


Programming Note: 
The interrupt need not be initiated before the next instruction is executed on the requesting 
processor, even if the requesting processor is also the target processor for the request. 
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2.2.17 Write Kernel Global Pointer 
Format: 


wikgp ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

KGP < a0 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrkgp Write kernal global pointer 


Description: 


The write kernel global pointer (wrkgp) instruction writes the value passed in a0 to the kernel 
global pointer (KGP) internal register. The KGP is used to load the GP on exceptions. On 
return from the wrkgp instruction, registers tO, t8...t11, and a0 are UNPREDICTABLE. 
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2.2.18 Write Machine Check Error Summary 
Format: 


wrmces ! PALcode format 


Operation: 

if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 
if (a0<0> EQ 1) then MCES<0> < 0 
if (a0<1> EQ 1) then MCES<1> < 0 
if (a0<2> EQ 1) then MCES<2> <« 0 
MCES<3> < a0<3> 
MCES<4> < a0<4> 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrmces Write machine check error summary 


Description: 


The write machine check error summary (wrmces) instruction clears the machine check in 
progress bit and clears the processor- or system-correctable error in progress bit in the MCES 
register. The instruction also sets or clears the processor- or system-correctable error reporting 
enabled bit in the MCES register. On return from the wrmces instruction, registers tO, t8...t11 
are UNPREDICTABLE. 
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2.2.19 Performance Monitoring Function 


Format: 
wrperfmon ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 
! aQ contains implementation specific input values 
! al contains implementation specific output values 
! v0 may return implementation specific values 
! Operations and actions taken are implementation specific 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrperfmon Performance monitoring 


Description: 


The performance monitoring instruction (wrperfmon) alerts any performance monitoring soft- 
ware/hardware in the system to monitor the performance of this process. The wrperfmon 
function arguments and actions are platform and chip dependent, and when defined for an 
implementation, are described in Appendix E. 


Registers a0 and al contain implementation-specific input values. Implementation-specific val- 
ues may be returned in register v0. On return from the wrperfmon instruction, registers a0, al, 
tO, and t8...tl11 are UNPREDICTABLE. 
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2.2.20 Write User Stack Pointer 


Format: 


wrusp ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

USP < a0 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrusp Write user stack pointer 


Description: 


The write user stack pointer (wrusp) instruction writes the value passed in a0 to the user stack 
pointer. On return from the wrusp instruction, registers tO, t8...t11, and a0 are 
UNPREDICTABLE. 
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2.2.21 Write System Value 
Format: 


wrval 'PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

sysvalue < a0 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrval Write system value 


Description: 


The write system value (wrval) instruction writes the value passed in a0 to a 64-bit system 
value register. The combination of wrval with the rdval instruction, described in Section 2.2.6, 
allows access by the operating system to a 64-bit per-processor value. On return from the 
wrval instruction, registers tO, t8...t11, and a0 are UNPREDICTABLE. 
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2.2.22 Write Virtual Page Table Pointer 


Format: 


wrvptptr ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

VPTPTR <- a0 


Exceptions: 


Opcode reserved to DIGITAL 


Instruction Mnemonics: 


wrvptptr Write virtual page table pointer 


Description: 


The write virtual page table pointer (wrvptptr) instruction writes the pointer passed in a0 to 
the virtual page table pointer register (VPTPTR). The VPTPTR is described in Section 3.6.2. 
On return from the wrvptptr instruction, registers t0, t8...t11, and a0 are UNPREDICTABLE. 
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2.2.23 Wait For Interrupt 


Format: 


wtint ! PALcode format 


Operation: 


! a0 contains the maximum number of interval clock ticks to skip 
! v0 receives the number of interval clock ticks actually skipped 


IF (implemented) 
BEGIN 
IF {Implementation supports skipping multiple 
clock interrupts} THEN . 
{Ticks to skip < a0} 


{Wait no longer than any non-clock interrupt or the first clock 
interrupt after ticks to skip ticks have been skipped} 


IF {Implementation supports skipping multiple 
clock interrupts} THEN 
v0 «number of interval clock ticks actually skipped 


ELSE 
v0 « 0 
END 
ELSE 
v0 « 0 
{return} 


Exceptions: 


Opcode reserved to DIGITAL. 


Instruction Mnemonics: 


wtint Wait for interrupt 


Description: 
The wait for interrupt instruction (wtint) requests that, if possible, the PALcode wait for the 
first of either of the following conditions before returning: 

e = Any interrupt other than a clock tick 

¢ The first clock tick after a specified number of clock ticks has been skipped 


The wtint instruction returns in vO the number of clock ticks that are skipped. The number 
returned in vO is zero on hardware platforms that implement this instruction, but where it is 


not possible to skip clock ticks. 
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The operating system can specify a full 64-bit integer value in a0 as the maximum number of 
interval clock ticks to skip. A value of zero in a0 causes no clock ticks to be skipped 


Note the following if specifying in a0 the maximum number of interval clock ticks to skip: 


e Adherence to a specified value in a0 is at the discretion of the PALcode; the PALcode 
may complete execution of wtint and proceed to the next instruction at any time up to 
the specified maximum, even if no interrupt or interval-clock tick has occurred. That is, 
wtint may return before all requested clock ticks are skipped. 


e The PALcode must complete execution of wtint if an interrupt occurs or if an inter- 
val-clock tick occurs after the requested number of interval-clock ticks has been 
skipped. 


In a multiprocessor environment, only the issuing processor is affected by an issued wtint 
instruction. 


The counter, PCC, may increment at a lower rate or may stop entirely during wtint execution. 
This side effect is implementation dependent. 


| DIGITAL Restricted Distribution 
2-36 DIGITAL UNIX Software (II-B) , 


2.3 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 
2. DEC OSF/1 —> Digital UNIX 
3. Added eco 91, wtint instruction 
4. Added eco 92, urti instruction 
> 


Added eco 101, clrfen instruction 


Revision 6.0, December 1994 
1. OSF/1——> Digital UNIX 
Alpha —-> Alpha AXP 
Added t0 scratch to retsys and callsys instructions 
Added cflush instruction, ECO 52 
Added cserve instruction 
Added swppal instruction 
Added rdmces instruction 


Added wrmces instruction 


BO Oe * ol OS a eS 


Added wrperfmon instruction 


Revision 1.0, May 12, 1992 


1. First review distribution\ 
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Chapter 3 


Memory Management (II-B) 


3.1 Virtual Address Spaces 


A virtual address is a 64-bit unsigned integer that specifies a byte location within the virtual 
address space. Implementations subset the supported address space to one of several sizes, as 
a function of page size and page table depth. The minimal supported virtual address size is 43 
bits. If an implementation supports less than 64-bit virtual addresses, it must check that all 
the VA<63:vaSize> bits are equal to VA<vaSize-1>. This gives two disjoint ranges for 
valid virtual addresses. For example, for a 43-bit virtual address space, valid virtual address 
ranges are 0---3FFFFFFFFFF1,¢ and FFFFFC0000000000) ¢--- FFFFFFFFFFFFFFFF j¢. 
Access to virtual addresses outside an implementation’s valid virtual address range cause an 
access-violation fault. 


.The virtual address space is divided into three segments. 
The two bits,va<vaSize—1:vaSize—2>, select a segment as shown in Table 3-1. 


Table 3-1: Virtual Address Space Segments 


VA<vaSize—l1:vaSize-2> Name Mapping Access Control 

Ox seg0 3- or 4-level page tables Programmed in PTE 
10 kseg PA — SEXT(VA<(vaSize—3):0>) Kernel Read/Write 
11 seg] 3- or 4-level page tables Programmed in PTE 


For kseg, the relocation, sharing, and protection are fixed. The base of kseg is located at 
LEFT_SHIFT(FFFFFC00000000001¢ , (vaSize—43)). 


For seg0O and seg1, the virtual address space is broken into pages, which are the units of relo- 
cation, sharing, and protection. The page size ranges from 8K bytes to 64K bytes. Therefore, 
system software should allocate regions with differing protection on 64K-byte virtual address 
boundaries to ensure image compatibility across all Alpha implementations. 


Memory management provides the mechanism to map the active part of the virtual address 
space to the available physical address space. The operating system controls the vir- 
tual-to-physical address mapping tables and saves the inactive (but used) parts of the virtual 
address space on external storage media. 
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3.1.1 Segment Seg0 and Seg1 Virtual Address Format 


The processor generates a 64-bit virtual address for each instruction and operand in memory. 
A segO or segl virtual address consists of three or four level-number fields and a 
byte_within_page field, as shown in Figures 3-1 and 3-2. 


Figure 3-1: Virtual Address Format, Three-Level Mode 


63 M 0 


Figure 3-2: Virtual Address Format, Four-Level Mode 





* LevelO <M:L+1> contains SEXT(VA<L>), where L is the highest numbered implemented VA bit. 


The byte_within_page field can be either 13, 14, 15, or 16 bits depending on a particular 
implementation. Thus, the allowable page sizes are 8K bytes, 16K bytes, 32K bytes, and 64K 
bytes. The low-order bit in each level-number field is 0 and each field is 0---n bits, where for 


example, n is 9 for an 8K page size. Level-number fields are the same size for a given 
implementation. 


The level-number fields are a function of the page size; all page table entries at any given 
level do not exceed one page. The PEN field in the PTE is always 32 bits wide. Thus, as the 
page size grows, the virtual and physical address size also grows. 


Table 3-2 shows the virtual address options and physical address size (in bits) calculations. 
The physical address (bits) column is the maximum physical address allowed by the smaller 
of the kseg size or available physical address bits for a given page size. The available physical 
address bits is calculated by combining the number of bits in the PFN (always 32) with the 
number of bits in the byte_within_page field. The kseg segment size is calculated from the vir- 
tual address size minus 2. 
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Table 3-2: Virtual Address Options 


Page Size Byte_within_page Level Size Virtual Address Physical Address 


(bytes) (bits) (bits) (bits)! (bits) 

8K 13 10 43, 45-53 41, 43-45 
16K 14 11 47, 49-58 45,46 
32K 15 12 51, 53-63 47 

64K 16 13 55, 57-647 48 


! Bit counts for three levels or four levels, respectively (vaSize) 
2 Level 0 page table not fully utilized for this page size 


3.1.2 Kseg Virtual Address Format 


The processor generates a 64-bit virtual address for each instruction and operand in memory. 
A kseg virtual address consists of segment select field with a value of 10, and a physical 


address field. The segment select field is the two bits va<vaSize—1:vaSize—2>. The physical 
address field is va<vaSize—3:0>. 


Figure 3-3: Kseg Virtual Address Format 


63 0 


SEXT (segment_select<1>) Segment Select=1 0, Physical Address 


3.2 Physical Address Space 


Physical addresses are at most vaSize—2 bits. This allows all of physical memory to be 
accessed via kseg. A processor may choose to implement a smaller physical address space by 
not implementing some number of high-order bits. 


The two most significant implemented physical address bits delineate the four regions in the 
physical address space. Implementations use these bits as appropriate for their systems. For 
example, in a workstation with a 30-bit physical address space, bit<29> might select between 
memory and non-memory-like regions, and bit <28> could enable or disable cacheing (see 
Common Architecture, Chapter 5). 


3.3 Memory Management Control 


Memory management is always enabled. Implementations must provide an environment for 
PALcode to service exceptions and to initialize and boot the processor. For example PALcode 
might run with I-stream mapping disabled. 
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3.4 Page Table Entries 


The processor uses a quadword page table entry (PTE) to translate segO0 and seg1 virtual 
addresses to physical addresses. A PTE contains hardware and software control information 
and the physical page frame number (PFN). A PTE is a quadword with fields as shown in 
Figure 3-4 and described in Table 3-3. 


Figure 3-4: Page Table Entry (PTE) 


63 


i 


32 31 





Table 3-3: Page Table Entry (PTE) Bit Summary 


Bits Name 


63-32 PEN 
31-16 SW 
15-14 RSVO 
13 UWE 
12 KWE 
11-10 RSVI 
9 URE 
8 KRE 


Meaning 


Page frame number 

The PEN field always points to a page boundary. If V is set, the PFN is 
concatenated with the byte_within_page bits of the virtual address to 
obtain the physical address. 


Reserved for software. 
Reserved for hardware; SBZ. 


User write enable. 

Enables writes from user mode. If this bit is 0 and a store is attempted 
while in user mode, an access-violation fault occurs. This bit is valid 
even when V=0. | 


Note: 
If a write enable bit is set and the corresponding read 


enable bit is not, the operation of the processor is 
UNDEFINED. 


Kernel write enable. 

Enables writes from kernel mode. If this bit is 0 and a store is attempted 
while in kernel mode, an access-violation fault occurs. This bit is valid 
even when V=0. 


Reserved for hardware; SBZ. 


User read enable. 

Enables reads from user mode. If this bit is 0 and a load or instruction 
fetch is attempted while in user mode, an Access Violation occurs. This 
bit is valid even when V=0. 

Kermel read enable. 

Enables reads from kernel mode. If this bit is 0 and a load or instruction 
fetch is attempted while in kernel mode, an access-violation fault occurs. 
This bit is valid even when V=0. 
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Table 3-3: Page Table Entry (PTE) Bit Summary (Continued) 


Bits Name 


7 RSV2 
6-5 GH 

4 ASM 
3 FOE 
Z FOW 
1 FOR 
0 Vv 


Meaning 


Reserved for hardware; SBZ. 


Granularity hint. 
Software may set these bits to a non-zero value to supply a hint to 
translation buffer implementations that a block of pages can be treated 


as a single larger page: 


1. A block is an aligned group of 8**N pages, where N is the 
value of PTE<6:5>, for example, a group of 1, 8, 64, or 512 
pages starting at a virtual address with page_size + 3*N 
low-order zeros. 


2. The block is a group of physically contiguous pages that are 
aligned both virtually and physically. Within the block, the low 
3*N bits of the PFNs describe the identity mapping and the 
high 32—3*N PEN bits are all equal. 


3. Within the block, all PTEs have the same values for bits 
<15:0>. Hardware may use this hint to map the entire block 
with a single TB entry, instead of 8, 64, or 512 separate TB 
entries. 


Address space match. 

When set, this PTE matches all address space numbers. For a given VA, 
ASM must he set consistently in all processes; otherwise, the address 
mapping is UNPREDICTABLE. 


Fault on execute. 
When set, a Fault on Execute exception occurs on an attempt to execute 
any location in the page. 


Fault on write. 
When set, a Fault on Write exception occurs on an attempt to write any 
location in the page. 


Fault on read. 
When set, a Fault on Read exception occurs on an attempt to read any 
location in the page. 


Valid. 

Indicates the validity of the PFN field. When V is set, the PFN field is 
valid for use by hardware. When V is clear, the PEN field is reserved for 
use by software. The V bit does not affect the validity of PTE<15:1> 
bits. 


3.4.1 Changes to Page Table Entries 


The operating system changes PTEs as part of its memory management functions. For exam- 
ple, the operating system may set or clear the V bit, change the PFN field as pages are moved 
to and from external storage media, or modify the software bits. The processor hardware 


never changes PTEs. 
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Software must guarantee that each PTE is always internally consistent. Changing a PTE one 
field at a time can cause incorrect system operation, such as setting PTE<V> with one instruc- 
tion before establishing PTE<PFN> with another. Execution of an interrupt service routine 
between the two instructions could use an address that would map using the inconsistent PTE. 
Software can solve this problem by building a complete new PTE in a register and then mov- 
ing the new PTE to the page table by using an STQ instruction. 


Multiprocessing complicates the problem. Another processor could be reading (or even 
changing) the same PTE that the first processor is changing. Such concurrent access must pro- 
duce consistent results. Software must use some form of software synchronization to modify 
PTEs that are already valid. Whenever a processor modifies a valid PTE, it is possible that 
other processors in a multiprocessor system may have old copies of that PTE in their transla- 
tion buffer. Software must inform other processors of changes to PTEs. Hardware must 
ensure that aligned quadword reads and writes are atomic operations. Hardware must not 
cache invalid PTEs (PTEs with the V bit equal to 0) in translation buffers. See Section 3.7 for 
more information. 


3.5 Memory Protection 


Memory protection is the function of validating whether a particular type of access is 
allowed to a specific page from a particular access mode. Access to each page is controlled 
by a protection code that specifies, for each access mode, whether read or write references 
are allowed. The processor uses the following to determine whether an intended access is 
allowed: 


¢ The virtual address, which is used to either select kseg mapping or provide the index 
into the page tables. 


© The intended access type (read or write). 
¢ The current access mode base on processor mode. 


For protection checks, the intended access is read for data loads and instruction fetches, and 
write for data stores. 


3.5.1 Processor Access Modes 


There are two processor modes, user and kernel. The access mode of a running process is 
stored in the processor status mode bit (PS<mode>). 


3.5.2 Protection Code 


Every page in the virtual address space is protected according to its use. A program may be 
prevented from reading or writing portions of its address space. A protection code associated 
with each page describes the accessibility of the page for each processor mode. 


For segO and seg1, the code allows a choice of read or write protection for each processor 
mode. For each mode, access can be read/write, read-only, or no-access. Read and write acces- 


sibility and the protection foreach mode are specified independently. 


For kseg, the protection code is kernel read/write, user no-access. 
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3.5.3 Access-Violation Faults 


An access-violation memory-management fault occurs if an illegal access is attempted, as 
determined by the current processor mode and the page’s protection. 


3.6 Address Translation for Seg0 and Seg1 


The page tables can be accessed from physical memory, or (to reduce overhead) can be 
mapped to a linear region of the virtual address space. The following sections describe both 
access methods. 


3.6.1 Physical Access for Seg0 and Seg1 PTEs 


Seg0 and seg1 address translation can be performed by accessing entries in a multilevel page 
table structure. The page table base register (PTBR) contains the physical page frame number 
(PEN) of the highest-level page table. If the system was booted with three levels of page table, 
this is the Level 1 page table. If the system was booted with four levels of page table, this is 
the Level 0 page table. In that case, bits <Level0> of the virtual address are used to index into 
the Level 0 page table to obtain the physical page frame number of the base of the Level 1 
page table. 


With either a three-level or four-level page table, bits <Levell> of the virtual address are used 
to index into the Level 1 page table to obtain the physical PFN of the base of the next level 
(Level 2) page table. Bits <Level2> of the virtual address are used to index into the Level 2 
page table to obtain the physical PFN of the base of the next level (Level 3) page table. Bits 
<Level3> of the virtual address are used to index the Level 3 page table to obtain the physical 
PFN of the page being referenced. The PFN is concatenated with virtual address bits 
<byte_within_page> to obtain the physical address of the location being accessed. 


If part of any page table does not reside in a memory-like region, or does reside in nonexistent 
memory, the operation of the processor is UNDEFINED. 


If all the higher-level PTEs (those PTEs that map higher-significance portions of the virtual 
address space than is mapped by Level 3) are valid, the protection bits are ignored; the protec- 
tion code in the Level 3 PTE is used to determine accessibility. If a higher-level PTE 
(numerically, any below Level 3) is invalid, an access-violation fault occurs if the PTE<KRE> 
equals zero. An access-violation fault on any higher-level PTE implies that all lower-level 
page tables mapped by that PTE do not exist. 
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The algorithm to generate a physical address from a segO or seg] virtual address follows: 


IF {SEXT(VA<(vaSize-1):0>) neq VA} THEN 
{ initiate access-violation fault} 
IF {booted with 4 levels of page table} THEN 


! Read physical: 
level0_ pte <« ({PTBR * page size} + {8 * VA<level0>}) 


IF level0 pte<v> eq 0 THEN 
IF level0 pte<KRE> eq 0 THEN 
{initiate access—-violation fault} 
ELSE 
{initiate translation-not-valid fault} 
! Read physical: 
levell pte < ({level0 pte<PFN> * page size} + {8 * VA<levell>}) 


ELSE 
! Read physical: 
levell pte « (PTBR * page size} + {8 * VA<levell>}) 


! Read physical: 
levell PTE « ({PTBR * page size} + {8 * VA<levell>} ) 
IF levell_ PTE<v> EQ 0 THEN 
IF levell PTE<KRE> eq 0 THEN 
{ initiate access-violation fault} 
ELSE 
{ initiate translation-not-valid fault} 
! Read physical: 
level2 PTE < ({levell PTE<PFN> * page size} + {8 * VA<level2>} ) 
IF level2 PTE<v> EQ 0 THEN 
IF level2_PTE<KRE> eq 0 THEN 
{ initiate access-violation fault} 
ELSE 
{ initiate translation-not-valid fault} 
! Read physical: 
level3 PTE < ({level2 PTE<PFN> * page size} + {8 * VA<level3>} ) 


£& Life ¢ t — Ve <= cl oy ve’ 


IF {{{level3 PTE<UWE> eq 0}AND {write access} AND {ps<mode> EQ 1}} OR 
{{level3 PTE<URE> eq 0} AND {read access} AND {ps<mode> EQ 1}} OR 
{{level3 PTE<KWE> eq 0}AND {write access} AND {ps<mode> EQ 0}} OR 
{{level3 PTE<KRE> eq 0}AND {read access} AND {ps<mode> EQ 0}}} 

THEN 
{initiate memory-management fault} 
ELSE 
IF level3 PTE<v> EQ 0 THEN 
{initiate memory-management fault} 
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IF { level3 PTE<FOW> eq 1} AND {write access} THEN | 


{initiate memory-—management fault} 


IF { level3 PTE<FOR> eq 1} AND {read access} THEN 


{initiate memory-management fault} 


IF { level3 PTE<FOE> eq 1} AND {execute access} THEN 


{initiate memory-management fault} 


Physical address < {level3 PTE<PFN> * page size} OR VA<byte within _page> 


3.6.2 Virtual Access for Seg0 or Seg] PTEs 


The page tables can be mapped into a linear region of the virtual address space, reducing the 
overhead for seg0 and segl PTE accesses. The mapping is done as follows, where, if the sys- 
tem is booted with three level fields in the virtual address format, Level_count=3. If the 
system is booted with four level fields in the virtual address format, Level_count=4. 


ae 


Select a 2(Level_count"lg(pageSize/8))}+3)  byte-aligned region (an address with 


Level_count*lg(pageSize/8)+3 low-order zeros) in the segO or seg] address space. 


Create a PTE to map the page tables as follows. 


PTE = 0 ! Initialize all fields to zero 
! Set the PFN to the PFN of the most-significant pagetable: 
PTE<63:32> = pfn_ of most-significant_pagetable 

PTE<8> = 1 ! Set the kernel read enable bit 
PTE<O> = 1 ! Set the valid bit 


Set the page table entry that corresponds to the VPTPTR to the created PTE. If operat- 
ing in the mode of three levels of page table, this is a Level 1 page table. If operating in 
the mode of four levels of page table, this is a Level 0 page table. 


Set all higher level, valid PTEs that map the Level 3 page tables to allow kernel read 
access. With this setup in place, the algorithm to fetch a segO or seg! PTE is as follows, 
where L_c represents Level_count and pS represents pageSize: 


! If booted with 3 level fields in the VA format, L_c=3 
! If booted with 4 level fields in the VA format, L c=4 


tmp « LEFT SHIFT (va, {64 - {{lg(pS)*{L_ctl}} - {L_c*3}}}) 

tmp < RIGHT SHIFT (tmp, {64 - {{lg(pS)*{L_ctl}} - {L_c*3}} + 1g(pS)-3}) 
tmp < VPTB OR tmp 

tmp<2:0> <« 0 

level3 PTE < (tmp) ! Load PTE using its virtual address 


Set the virtual page table pointer (VPTPTR) with a write virtual page table pointer 
instruction (wrvptptr) to the selected value. 


The virtual access method is used by PALcode for most TB fills. 
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Implementation Note: 


Assume the following: 

e §6Asystem with a 52-bit virtual address size 

¢ VPTIB is the index of the top-level page table entry, which is self-referencing. 

e =6The virtual address is in seg0 or seg]. | 

For a virtual address B, mapped with a three-level page table, the address to virtually 
access the Level 3 PTE is as follows. The double-miss TB fill flow is a three-level flow. 


Figure 3-5: Three-Level Page Table Mapping 


63 43 42 33 32 23 22 13 12 03 02 0 


SEXT (VPTB) VPTB B<42:33> B<32:23> B<22:13>. jo 


For a virtual address A, mapped with a four-level page table, the address to virtually 
access the Level 3 PTE is shown in Figure 3-6. The double-miss TB fill flow is a 
four-level flow. 


Figure 3-6: Four-Level Page Table Mapping 


63 53 52 43 42 33 32 23 22 13 12 03 02 0 





SEXT (VPTB) VPTB A<52:43> A<42:33> A<32:23> A<22:13> jo 


3.7 Translation Buffer 


In order to save actual memory references when repeatedly referencing the same pages, hard- 
ware implementations include a translation buffer to remember successful virtual address 
translations and page states. 


When the process context is changed, a new value is loaded into the address space number 
(ASN) internal processor register with a swap process context (swpctx) instruction. This 
causes address translations for pages with PTE<ASM> clear to be invalidated on a processor 
that does not implement address space numbers. 


_ Additionally, when the software changes any part (except the software field) of a valid PTE, it 
must also execute a tbi instruction. The entire translation buffer can be invalidated by tbia, and 
all ASM=0 entries can be invalidated by tbiap. The translation buffer must not store invalid 
PTEs. Therefore, the software is not required to invalidate translation buffer entries when mak- 
ing changes for PTEs that are already invalid. . 
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After software changes a valid zero-, first-, or second-level PTE, software must flush the trans- 
lation for the corresponding page in the virtual page table. Then software must flush the 
translations of all valid pages mapped by that page. In the case of a change to a first-level 
PTE, this action must be taken through a second iteration. In the case of a change to a 
zero-level PTE, this action must be taken through a second and third iteration. 


3.8 Address Space Numbers 


The Alpha architecture allows a processor to optionally implement address space numbers 
(process tags) to reduce the need for invalidation of cached address translations for pro- 
cess-specific addresses when a context switch occurs. The supported address space number 


(ASN) range is 0---MAX_ASN; MAX_ASN is provided in the HWRPB MAX_ASN field. 


The address space number for the current process is loaded by software in the address space 
number (ASN) with a swpctx instruction. ASNs are processor specific and the hardware 
makes no attempt to maintain coherency across multiple processors. In a multiprocessor sys- 
tem, software is responsible for ensuring the consistency of TB entries for processes that 
might be rescheduled on different processors. 


Systems that support ASNs should have MAX_ASN in the range 13---65535. The number of 
ASNs should be determined by the market a system is targeting. 


Programming Note: 


System software should not assume that the number of ASNs is a power of two. This 
allows hardware, for example, to use N TB tag bits to encode (2**N)—3 ASN values, one 
value for ASM=1 PTEs, and one for invalid. 


There are several possible ways of using ASNs that result from several complications in a 
multiprocessor system. Consider the case where a process that executed on processor—1 is 
rescheduled on processor—2. If a page is deleted or its protection is changed, the TB in 
processor—1 has stale data. 


¢ One solution is to send an interprocessor interrupt to all the processors on which this 
process could have run and cause them to invalidate the changed PTE. That results in 
significant overhead in a system with several processors. 


e Another solution is to have software invalidate all TB entries for a process on a new 

processor before it can begin execution, if the process executed on another processor 

during its previous execution. This ensures the deletion of possibly stale TB entries on 
the new processor. 


e A third solution is to assign a new ASN whenever a process is run on a processor that is 
not the same as the last processor on which it ran. 
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3.9 Memory-Management Faults 


On a memory-management fault, the fault code (MMCSR) is passed in al to specify the type 
of fault encountered, as shown in Table 3—4. 


Table 3-4: Memory-Management Fault Type Codes 


Fault MMCSR value 
Translation not valid 0 
Access-violation 1 
Fault on read 2 
Fault on execute &: 

4 


Fault on write 


A translation-not-valid fault is taken when a read or write reference is attempted 
through an invalid PTE in a zero (if one exists), first, second, or third-level page table. 


An access-violation fault is taken on a reference to a segO or seg] address when the pro- 
tection field of the third-level PTE that maps the data indicates that the intended page 
reference would be illegal in the specified access mode. An access-violation fault is 
also taken if the KRE bit is a zero in an invalid zero (if one exists), first, or second-level 
PTE. An access-violation fault is generated for any access to a kseg address when the 
mode is user (PS<mode> EQ 1). 


A fault-on-read (FOR) fault occurs when a read is attempted with PTE<FOR> set. 


A fault-on-execute (FOE) fault occurs when an instruction fetch is attempted with 
PTE<FOE> set. 


A fault-on-write (FOW) fault occurs when a write is attempted with PTE<FOWS set. 
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3.10 \Revision History 


Revision 7.0, November 10, 1997 


1. 


2 
3. 
4 


DEC OSF/1 ——> Digital UNIX 
Alpha AXP-——> Alpha 

Digital —> DIGITAL 

Added ECO 99 


Revision 6.0, December, 1994 


1. 


2 
3. 
4 


Added ECO 67 for the translation buffer section 
OSF/1 ——> DEC OSF/1 
Alpha —> Alpha AXP 


Cleaned-up physical address space section (3.3) 


Revision 1.0, May 12, 1992 


1. 


First review distribution\ 


DIGITAL Restricted Distribution 


Memory Management (II-B) 3-13 


Chapter 4 


Process Structure (II-B) 


4.1 Process Definition 


A process is a single thread of execution. It is the basic entity that can be scheduled and is exe- 
cuted by the processor. A process consists of an address space and both software and hardware 
context. The hardware context of a process is defined by the the following: 


Thirty integer registers (excludes R31 and SP) 
Thirty-one floating-point registers (excludes F31) 
The program counter (PC) 

The two per-process stack pointers (USP/KSP) 
The processor status (PS) | 
The address space number (ASN) 

The charged process cycles 

The page table base register (PTBR) 

The process unique value (unique) 

The floating-point enable register (FEN) 

The performance monitoring enable bit (PME) 


This information must be loaded if a process is to execute. 


While a process is executing, some of its hardware context is being updated in the internal reg- 
isters. When a process is not being executed, its hardware context is stored in memory in a 
software structure called the process control block (PCB). Saving the process context in the 
PCB and loading new values from another PCB for a new context is called context switching. 
Context switching occurs as one process after another is scheduled for execution. 
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4.2 Process Control Block (PCB) 


As shown in Figure 4—1, the PCB holds the state of a process. 


Figure 4-1: Process Control Block (PCB) 


63 62 61 32 31 10 


er 
| 
a 
reas cpa nineeriiy «ew Pm cyaee id 
ee 


Reserved to Digital 
Reserved to Digital 


The contents of the PCB are loaded and saved by the swap process context (swpctx) instruc- 
tion. The PCB must be quadword aligned and lie within a single page of physical memory. It 
should be 64-byte aligned for best performance. 
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The PCB for the current process is specified by the process control block base address register 
(PCBB); see Table 1-3. 


The swap privileged context instruction (swpctx) saves the privileged context of the current 
process into the PCB specified by PCBB, loads a new value into PCBB, and then loads the 
privileged context of the new process into the appropriate hardware registers. 


The new value loaded into PCBB, as well as the contents of the PCB, must satisfy certain con- 
straints or an UNDEFINED operation results: 


1. The physical address loaded into PCBB must be quadword aligned and describes eight 
contiguous quadwords that are in a memory-like region (see Common Architecture, 
Chapter 5). 


2. The value of PTBR must be the page frame number (PFN) of an existent page that is in 
a memory-like region. 


It is the responsibility of the operating system to save and load the non-privileged part of the 
hardware context. 


The swpctx instruction returns ownership of the current PCB to operating system software and 
passes ownership of the new PCB from the operating system to the processor. Any attempt to 
write a PCB while ownership resides with the processor has UNDEFINED resuits. if the PCB 
is read while ownership resides with the processor, it is UNPREDICTABLE whether the origi- 
nal or an updated value of a field is read. The processor is free to update a PCB field at any 
time. The decision as to whether or not a field is updated is made individually for each field. 
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The charged process cycles is the total number of PCC register counts that are charged to the 
process (modulo 2**32). When a process context is loaded by the swpctx instructions, the con- 
tents of the PCC count field (PCC_CNT) is subtracted from the contents of PCB[24]<31:0> 
and the result is written to the PCC offset field (PCC_OFF): 


PCC<63:32> < (PCB[24]<31:0> — PCC<31:0>) 


When a process context is saved by the swpctx instruction, the charged process cycles is com- 
puted by performing an unsigned add of PCC<63:32> and PCC<31:0>. That value is written 
to PCB[24]<31:0>. 


Software Programming Note: 


The following example returns in RO the current PCC register count (modulo 2**32) for a 
process. Notice the care taken not to cause an unwanted sign extension. 


RPCC RO ; Read the processor cycle counter 
SLL RO, #32, Rl ; Line up the offset and count fields 
ADDO RO, Rl, RO ; Do add 


SRL RO, #32, RO ; Zero extend the cycle count to 64 bits 
If ASNs are not implemented, the ASN field is not read or written by PALcode. 
The process unique value is that value used in support of multithread implementations. The 


value is stored in the PCB when the process is not active. When the process is active, the 
value may be cached in hardware internal storage or kept in the PCB only. 


The FEN bit reflects the setting of the FEN IPR. 


Setting the PME bit alerts any performance hardware or software in the system to monitor the 
performance of this process. 


Kernel mode code must use the rdusp/wrusp instructions to access the USP. Kernel mode code 
can read the PTBR, the ASN, the FEN, and the PME for the current process from the PCB. 
The unique value can be accessed with the rdunique and wrunique instructions. 
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4.3 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP ——> Alpha 
2. DEC OSF/1 ——>Digital UNIX 
3. Digital —> DIGITAL 


Revision 6.0, December 1994 
1. OSF/1 —-> DEC OSF/1 
2. Alpha ——> Alpha AXP 
3. Added PME and Charged Process Cycles to PCB and supporting text 


Revision 1.0, May 12, 1992 


1. First review distribution\ 
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Chapter 5 


Exceptions and Interrupts (II-B) 


5.1 Introduction 


At certain times during the operation of a system, events within the system require the execu- 
tion of software outside the explicit flow of control. When such an event occurs, an Alpha 
processor forces a change in control flow from that indicated by the current instruction stream. 
The notification process for such an event is either an exception or an interrupt. 


5.1.1 Exceptions 


Exceptions occur primarily in relation to the currently executing process. Exception service 
routines execute in response to exception conditions caused by software. All exception service 
routines execute in kernel mode on the kernel stack. Exception conditions consist of faults, 
arithmetic traps, and synchronous traps: 


A fault occurs during an instruction and leaves the registers and memory in a consistent 
State such that elimination of the fault condition and subsequent reexecution of the 
instruction gives correct results. Faults are not guaranteed to leave the machine in 
exactly the same state it was in immediately prior to the fault, but rather in a state such 
that the instruction can be correctly executed if the fault condition is removed. The PC 
saved in the exception stack frame is the address of the faulting instruction. An rti 
instruction to that PC reexecutes the faulting instruction. 


An arithmetic trap occurs at the completion of the operation that caused the exception. 
Since several instructions may be in various stages of execution at any point in time, it 
is possible for multiple arithmetic traps to occur simultaneously. 


The PC that is saved in the exception frame on traps is that of the next instruction that 
would have been issued if the trapping conditions had not occurred. However, that PC 
is not necessarily the address of the instruction immediately following the instruction 
that encountered the trap condition, and the intervening instructions are collectively 
called the trap shadow. See Common Architecture, Chapter 4, Arithmetic Trap 
Completion, forinformation. 


The intervening instructions may have changed operands or other state used by the 
instructions encountering the trap conditions. If such is the case, an rti instruction to 
that PC does not reexecute the trapping instructions, nor does it reexecute any 
intervening instructions; it simply continues execution from the point at which the trap 


~ was taken. 


In general, it is difficult to fix up results and continue program execution at the point 
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of an arithmetic trap. Software can force a trap to be continued more easily without 
the need for complicated fixup code. This is accomplished by specifying any valid 
qualifier combination that includes the /S qualifier with each such instruction and 
following a set of code-generation restrictions in the code that could cause arithmetic 
traps, allowing those traps to be completed by an OS completion handler. 


The AND of all the exception completion qualifiers for trapping instructions is 
provided to the OS completion handler in the exception summary SWC bit. If SWC is 
set, a completion handler may find the trigger instruction by scanning backward from 
the trap PC until each register in the register write mask has been an instruction 
destination. The trigger instruction is the last instruction in I-stream order to get a trap 
before the trap shadow. If the SWC bit is clear, no fixup is possible. 


A synchronous trap occurs at the completion of the operation that caused the exception. 
No instructions can be issued between the completion of. the operation that caused the 
exception and the trap. 


5.1.2 Interrupts 


The processor arbitrates interrupt requests. When the interrupt priority level (IPL) of an out- 
standing interrupt is greater than the current IPL, the processor raises IPL to the level of the 
interrupt and dispatches to entInt, the interrupt entry to the OS. Interrupts are serviced in ker- 
nel mode on the kernel stack. Interrupts can come from one of five sources: interprocessor 
interrupts, I/O devices, the clock, performance counters, or machine checks. 


5.2 Processor Status 


The processor status (PS) is a four-bit register that contains the current mode (PS<mode>) in 
bit <3> and a three-bit interrupt priority level (PS<IPL>) in bits <2...0>. The PS<mode> bit is 
zero for kernel mode and one for user mode. The PS<IPL> bits are always zero if the mode is 
user and can be zero to 7 if the mode 1s kernel. The PS is changed when an interrupt or excep- 
tion is initiated and by the rti, retsys, and swpipl instructions. 


The uses of the PS values are shown in Table 5—1. 


Table 5-1: Processor Status Summary 


PS<mode> PS<IPL> Mode Use 


Co Oo oOo Oo 


Co Oo © 


0 User User software 

0 Kernel System software 

1 Kernel System software 

2 Kernel System software 

3 Kernel Low priority device interrupts 

4 Kernel High priority device interrupts 

5 Kernel Clock, and interprocessor interrupts 
6 Kernel Real-time devices 
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Table 5-1: Processor Status Summary (Continued) 


PS<mode> PS<IPL> Mode Use 


0 6 Kernel Correctable error reporting 


0 a Kernel Machine checks 


5.3 Stack Frames 


There are three types of system entries: entries for the callsys instruction from user mode, 
entries for exceptions and interrupts from kernel mode, and entries for interrupts from user 
mode. 


Those three types of system entries use one of two stack frame layouts, as follows 


Entries for the callsys instruction from user mode, and entries for exceptions and interrupts 
from kernel mode use the same stack frame layout, as shown in Figure 5-1. The stack frame 
contains space for the PC, the PS, the saved GP, and the saved registers a0, al, a2. On entry, 
the SP points to the saved PS. 


The callsys entry saves the PC, the PS, and the GP. The exception and interrupt entries save 
the PC, the PS, the GP, and also save the registers a0...a2. 


Figure 5-1: Stack Frame Layout for callsys and rti 





Entries for interrupts from user mode use the stack frame layout as shown in Figure 5-2. The 
stack frame must be aligned on a 64-byte boundary and contains the registers, at, SP, PS, PC, 
GP, and saved registers a0, al, and a2. 
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Figure 5—2: Stack Frame Layout for urti 





5.4 System Entry Addresses 
All system entries are in kernel mode. The interrupt priority PS bits (PS<IPL>) are set as 
shown in the following table. The system entry point address is set by the wrent instruction, as 


described in Section 2.2.14. 


Table 5-2: Entry Point Address Registers 


Entry Point Valuein a0 Value in al Value in a2 PS<IPL> 

entArith Exception Register mask UNPREDICTABLE — Unchanged 
summary 

entIF Fault or trap UNPREDICTABLE UNPREDICTABLE Unchanged 
type code 

entInt Interrupt type Vector Interrupt Priority of 

parameter interrupt 

entMM VA MMCSR Cause Unchanged 

entSys po | pi p2 Unchanged 

entUna VA Opcode Src/Dst Unchanged 


5.4.1 System Entry Arithmetic Trap (entArith) 


The arithmetic trap entry, entArith, is called when an arithmetic trap occurs. On entry, a0 con- 
tains the exception summary register and al contains the exception register write mask. 
Section 5.4.1.1 describes the exception summary register and Section 5.4.1.2 describes the reg- 
ister write mask. 
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5.4.1.1 Exception Summary Register 


The exception summary register, shown in Figure 5—3 and described in Table 5-3, records the 
various types of arithmetic exceptions that can occur together. 


Figure 5-3: Exception Summary Register 
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Table 5-3: Exception Summary Register Bit Definitions 





Bit Description 
63-7 Zero. 
6 Integer overflow (IOV) 


An integer arithmetic operation or a conversion from floating to integer over- 
flowed the destination precision. 


An IOV trap is reported for any integer operation whose true result exceeds the 
destination register size. Integer overflow trap enable can be specified in each 
arithmetic integer operate instruction and each floating-point convert-to-integer 
instruction. If integer overflow occurs, the result register is written with the trun- 
cated true result. 


5 Inexact result (INE) 
A floating arithmetic or conversion operation gave a result that differed from the 
mathematically exact result. 


An INE trap is reported if the rounded result of an IEEE operation is not exact. 
Inexact result trap enable can be specified in each IEEE floating-point operate 
instruction. The rounded result value is stored in all cases. 


4 Underflow (UNF) 
A floating arithmetic or conversion operation underflowed the destination expo- 
nent. 


An UNF trap is reported when the destination’s smallest finite number exceeds in 
magnitude the non-zero rounded true result. Floating underflow trap enable can be 
specified in each floating-point operate instruction. If underflow occurs, the result 
register is written with a true zero. 


3 Overflow (OVF) 
A floating arithmetic or conversion operation overflowed the destination exponent. 


An OVF trap is reported when the destination’s largest finite number is exceeded 
in magnitude by the rounded true result. Floating overflow traps are always 
enabled. If this trap occurs, the result register is written with an UNPREDICT- 
ABLE value. 
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Table 5-3: Exception Summary Register Bit Definitions (Continued) 


Bit Description 


2 Division by zero (DZE) 
An attempt was made to perform a floating divide operation with a divisor of zero. 


A DZE trap is reported when a finite number is divided by zero. Floating divide by 
zero traps are always enabled. If this trap occurs, the result register is written with 
an UNPREDICTABLE value. 


1 Invalid operation (INV) 
An attempt was made to perform a floating arithmetic, conversion, or comparison 
operation, and one or more of the operand values were illegal. 


An INV trap is reported for most floating-point operate instructions with an input 
operand that is an IEEE NaN, IEEE infinity, or IEEE denormal. 


Floating invalid operation traps are always enabled. If this trap occurs, the result 
register is written with an UNPREDICTABLE value. 


0 Software completion (SWC) 
Is set when all of the other arithmetic exception bits were set by floating-operate 
instructions with the /S qualifier set. See Common Architecture, Chapter 4, Arith- 
metic Trap Completion, for rules about setting the /S qualifier in code that may 
cause an arithmetic trap, and Section 5.1.1 for rules about using the SWC bit in a 
trap handler. 


5.4.1.2 Exception Register Write Mask 


The exception register write mask parameter records all registers that were targets of instruc- 
tions that set the bits in the exception summary register. There is a one-to-one correspondence 
between bits in the register write mask quadword and the register numbers. The quadword, 
starting at bit 0 and proceeding right to left, records which of the registers r0 through r31, then 
f0 through f31, received an exceptional result. 


Note: 


For a sequence such as: 


ADDF F1,F2,F3 
MULF F4,F5,F3 


if the add overflows and the multiply does not, the OVF bit is set in the exception 
summary, and the F3 bit is set in the register mask, even though the overflowed sum in F3 
can be overwritten with an in-range product by the time the trap is taken. (This code 
violates the destination reuse rule for exception completion. See Common Architecture, 


Chapter 4, Arithmetic Trap Shadows, for the destination reuse rules.) 
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The PC value saved in the exception stack frame is the virtual address of the next instruction. 
This is defined as the virtual address of the first instruction not executed after the trap condi- 
tion was recognized. 


5.4.2 System Entry Instruction Fault (entIF) 


The instruction fault or synchronous trap entry is called for bpt, bugchk, gentrap, and opDec 
synchronous traps, and for a FEN fault (floating-point instruction when the floating-point unit 
is disabled, FEN EQ 0). On entry, a0 contains a 0 for a bpt, a 1 for bugchk, a 2 for gentrap, a 
3 for FEN fault, and a 4 for opDec. No additional data is passed in al...a2. The saved PC at 
(SP+00) is the address of the instruction that caused the fault for FEN faults. The saved PC at 
(SP+00) is the address of the instruction after the instruction that caused the bpt, bugchk, gen- 
trap, and opDec synchronous traps. 


5.4.3 System Entry Hardware Interrupts (entInt) 


The interrupt entry is called to service a hardware interrupt or a machine check. Table 5—4 
shows what is passed in a0...a2 and the PS<IPL> setting for various interrupts. 


Table 5—4: System Entry Hardware Interrupts 


Entry Type Value in a0 Value in al Value in a2 PS<IPL> 
Interprocessor 0 UNPREDICTABLE UNPREDICTABLE | 
interrupt 

Clock i UNPREDICTABLE UNPREDICTABLE 5 
Correctable 2 Interrupt vector Pointer to Logout Area 7 

error 

Machine check 2 Interrupt vector Pointer to Logout Area 7 

I/O device 3 Interrupt vector UNPREDICTABLE Level of device 
interrupt 

Performance 4 Interrupt vector UNPREDICTABLE 6 

counter , 


On entry to the hardware interrupt routine, the IPL has been set to the level of the interrupt. 
For hardware interrupts, register al contains a platform-specific interrupt vector. That plat- 
form-specific interrupt vector is typically the same value as the SCB offset value that would 
be returned if the platform was running OpenVMS Alpha PALcode. 


For a correctable error or machine check interrupt, al contains a platform-specific interrupt 
vector and a2 contains the kseg address of the platform-specific logout area. The interrupt vec- 
tor value and logout area format are typically the same as those used by the platform when 
running OpenVMS Alpha PALcode. 


The machine check error summary (MCES) register, shown in Figure 5—4 and described in 
Table 5—5, records the correctable error and machine check interrupts in progress. 
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Figure 5—4: Machine Check Error Status (MCES) Register 
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Table 5-5: Machine Check Error Status (MCES) Register Bit Definitions 
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Bit Symbol Description 


63-32 IMP. 
31-5 Reserved. 
4 DSC Disable system correctable error in progress. 


Set to disable system correctable error reporting. 


3 DPC Disable processor correctable error in progress. 
Set to disable processor correctable error reporting. 


2 PCE Processor correctable error in progress. 
Set when a processor correctable error is detected. Should be cleared 
by the processor correctable error handler when the logout frame may 
be reused. 


1 SCE System correctable error in progress. 
Set when a system correctable error is detected. Should be cleared by 
the system correctable error handler when the logout frame may be 
reused. 


0 MIP Machine check in progress. 
; Set when a machine check occurs. Must be cleared by the machine 
check handler when a subsequent machine check can be handled. 
Used to detect double machine checks. 


The MIP flag in the MCES register is set prior to invoking the machine check handler. If the 
MIP flag is set when a machine check is being initiated, a double machine check halt ts miti- 
ated instead. The machine check handler needs to clear the MIP flag when it can handle a new 
machine check. 


Similiarly, the SCE or PCE flag in the MCES register is set prior to invoking the appropriate 
correctable error handler. That error handler should clear the appropriate correctable error in 
progress when the logout area can be reused by hardware or PALcode. PALcode does not 
overwrite the logout area. 


Correctable processor or system error reporting may be suppressed by setting the respective 
DPC or DSC flag in the MCES register. When the DPC or DSC flag is set, the corresponding ~ 


eiror is corrected, but no correctabie error interrupt is generated. 
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5.4.4 System Entry MM Fault (entMM) 


The memory-management fault entry is called when a memory management exception occurs. 
On entry, aO contains the faulting virtual address and al contains the MMCSR (see Section 
3.9). On entry, a2 is set to a minus one (—1) for an instruction fetch fault, to a plus one (+1) for 
a fault caused by a store instruction, or to a O for a fault caused by a load instruction. 


5.4.5 System Entry Call System (entSys) 


The system call entry is called when a callsys instruction is executed in user mode. On entry, 
only registers (t8...t11) have been modified. The PC+4 of the callsys instruction, the user glo- 
bal pointer, and the current PS are saved on the kernel stack. Additional space for a0...a2 is 
allocated. After completion of the system service routine, the kernel code executes a 
CALL_PAL retsys instruction. 


5.4.6 System Entry Unaligned Access (entUna) 


The unaligned access entry is called when a load or store access is not aligned. On entry, a0 
contains the faulting virtual address, al contains the zero extended six-bit opcode (bits 
<31:26>) of the faulting instruction, and a2 contains the zero extended data source or destina- 
tion register number (bits<25:21>) of the faulting instruction. 


5.5 PALcode Support 


5.5.1 Stack Writeability and Alignment 


PALcode only accesses the kernel stack. Any PALcode accesses to the kernel stack that would 
produce a memory-management fault will result in a kernel-stack-not-valid halt. The stack 
pointer must always point to a quadword-aligned address. If the kernel stack is not quadword 
aligned on a PALcode access, a kernel-stack-not-valid halt is initiated. 
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5.6 \Revision History 


Revision 7.0, November 10, 1997 


I. 
Di 
2. 


Alpha AXP —> Alpha 
DEC OSF/1 —-> Digital UNIX 
Digital —> DIGITAL 


Revision 6.0, December 1994 


1 
2 
D: 
4 


OSF/1 —-> DEC OSF/1 

Alpha —> Alpha AXP 

Added ECO 50, Performance monitors 
Added ECO 51, Machine checks 


Revision 1.0, May 12, 1992 


1. 


First review distribution\ 
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A 


Access violation fault, 3-12 

Address space match (ASM), bitin PTE, 3-5 

Address space number (ASN) register 
defined, 1-2 


described, 3-11 

in process context, 4—1 
Address translation 

physical, 3-7 

virtual, 3-9 
Arithmetic trap entry (entArith) register, 1-2, 5-4 
Arithmetic traps 

division by zero, 5-6 

inexact result, 5—5 

integer overflow, 5—5 

invalid operation, 5-6 

overflow, 5-5 

system entry for, 5-4 

underflow, 5-5 


bpt (PALcode) instruction, 2-2 
Breakpoint trap, initiating, 2-2 
bugchk (PALcode) instruction, 2-3 
Byte_within_page field, 3-2 


C 


Caches, flushing physical page from, 2-11 


callsys (PALcode) instruction, 2-4 
entSys with, 5-9 
stack frames for, 5-3 


cflush (PALcode) instruction, 2—11 


Charged process cycles register 
in process context, 4—1 
clrfen (PALcode) instruction, 2-5 


cserve (PALcode) instruction, 2—12 


DIGITAL Restricted Distribution 


D 


Data alignment traps, system entry for, 5-9 
Division by zero trap, 5-6 

DPC bit, machine check error summary register, 5-8 
DSC bit, machine check error summary register, 5-8 
DZE bit, exception summary register, 5-6 


iE 


entArith. See Arithmetic trap entry 
entlF. See Instruction fault entry 
entInt. See Interrupt entry 
entMM. See Memory management fault entry 
entSys. See System call entry 
entUna. See Unaligned access fault 
Errors, correctable processor, 5—8 
Errors, correctable system, 5-8 
Exception register write mask, 5—6 
Exception summary register, 5—2, 5-5 
format of, 5-5 


Exceptions 


defined, 5—1 
stack frames for, 5-3 


F 


Fault on execute (FOE), 3-12 
bitin PTE, 3-5 

Fault on read (FOR), 3-12 
bitin PTE, 3-5 

Fault on write (FOW), 3-12 
bitin PTE, 3-5 

Faults 


defined, 5-1 

fault on execute, 3-12 

fault on read, 3-12 

fault on write, 3-12 
memory management, 3-12 
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FEN. See Floating-point enable 


Floating-point enable (FEN) register 
defined, 1-3 
in process context, 4—1 


- Floating-point registers. See Registers 
FP. See Frame pointer 
Frame pointer (FP) register, linkage for, 1-1 


G 


gentrap (PALcode) instruction, 2-6 
Global pointer (GP) register, linkage for, 1-1 
Granularity hint (GH), bits in PTE, 3-5 


H 


Hardware context, 4—1 


Hardware interrupts, servicing, 5-7 


INE bit, exception summary register, 5-5 

Inexact result trap, 5-5 

Instruction fault entry (entIF) register, 1-2, 5-4, 5-7 
Instruction fault, system entry for, 5—4 

Integer overflow trap, 5-5 

Integer registers. See Registers 

Interprocessor interrupt, generating, 2-28 

Interrupt entry (entInt) register, 1-2, 5-4, 5-7 
Interrupt priority level (IPL), PS with, 5-2 


Interrupts - 


sources for, 5—2 
stack frames for, 5-3 
system entry for, 5-4 


intr_flag register, 1-3 

INV bit, exception summary register, 5-6 
Invalid operation trap, 5-6 

IOV bit, exception summary register, 5—5 


K 


Kernel global pointer (KGP) register, 1-3 
Kernel read enable (KRE), bit in PTE, 3-4 


Kernel stack pointer (KSP) register 
defined, 1-3 
in process context, 4-1 


Kernel write enable (K WE), bit in PTE, 3-4 
KGP. See Kernel global pointer 
Kseg 
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format of, 3-2 

mapping of, 3-1 

physical space with, 3-3 
KSP. See Kernel stack pointer 


L 


lock_flag register, 1-3 
M 
Machine check error summary (MCES) register 
defined, 1-3 
reading, 2-13 
structure of, 5-7 
writing, 2-30 
Machine checks, interrupt entry for, 5-7 
maxCPU, 1-2 


MCES. See Machine check error summary 





Memory management 


control of, 3-3 
faults, 3-12 


Memory management fault entry (entMM) register, 
1-3, 5-4, 5-9 . 


Memory management faults 
system entry for, 5-4 
. types, 3-12 
Memory protection, 3-6 
MIP bit, machine check error summary register, 5-8 
MMCSR, 5-7 
MMCSR code, 3-12 


O 


opDec, 1-2 





Overflow trap, 5-5 
OVF bit, exception summary register, 5—5 


p 


Page frame number (PFN) 
bits in PTE, 3-4 
with physical address translation, 3-7 
Page sizes, 3-2 
Page table base (PTBR) register 
defined, 1-4 
in process context, 4—1 
with physicai address iransiation, 
Page table entry (PTE) 
bits, summarized, 3-4 
changing, 3-5 
changing and managing, 3-5 
format of, 3-4 
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virtual access of, 3-9 

Pages, size range of, 3-1 

pageSize, 1-2 

PALcode 
DIGITAL UNIX support for, 5-9 
switching, 2-22 

PALcode instructions 


DIGITAL UNIX privileged (list), 2-10 
DIGITAL UNIX unprivileged (list), 2-1 


PALcode instructions, DIGITAL UNIX privileged 


cache flush, 2-11 

console service, 2-12 

performance monitoring function, 2-31 

read machine check error summary, 2-13 

read processor status, 2-14 

read system value, 2-16 

read user stack pointer, 2-15 

return from system call, 2-17 

return from trap, fault, or interrupt, 2-18 

swap IPL, 2-21 

swap PALcode image, 2-22 

swap process context, 2-19 

TB (translation buffer) invalidate, 2—24 

wait for interrupt, 2-35 

who am I, 2-25 

write floating-point enable, 2—27 

write interprocessor interrupt request, 2—28 

write kernel global pointer, 2—29 

write machine check error summary, 2-30 

write system entry address, 2—26 

write system value, 2-33 

write user stack pointer, 2—32 

write virtual page table pointer, 2-34 
PALcode instructions, DIGITAL UNIX unprivileged 

breakpoint, 2-2 

bugcheck, 2-3 

clear floating-point enable, 2-5 

generate trap, 2-6 

read unique value, 2-7 

system call, 2-4 

write unique value, 2-8, 2-9 


PC. See Program counter 

PCB. See Process control block 

PCBB. See Process control block base 

PCE bit, machine check error summary register, 5-8 


Performance monitoring enable (PME) bit 
defined, 1-4 
in process context, 4-1 

Performance monitoring register (PERFMON) 
writing, 2-31 

Physical address space, 3-3 

Physical address translation, 3-7 

PME. See Performance monitoring enable 

Process context, 4—1 
saved in PCB, 4-2 

Process control block (PCB), 4-2 
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structure, 4—2 
Process control block (PCB) register, 1-3 
Process control block base (PCBB) register, 1-3 
Process unique value (unique) register, 1—4 
in process context, 4-1 
Processor cycle counter (PCC) register 
for Digital UNIX, 1-4 
Processor status (PS) register 
bit meanings for, 5-2 
defined, 1-4 
in process context, 4—1 


Program counter (PC) register 


defined, 1-3 
in process context, 4-1 
with arithmetic traps, 5-1 


Protection code, 3-6 
PS. See Processor status 
PTBR. See Page table base 


R 


rdmces (PALcode) instruction, 2-13 
rdps (PALcode) instruction, 2-14 
rdunique (PALcode) instruction, 2—/ 
rdusp (PALcode) instruction, 2—15 
rdval (PALcode) instruction, 2—16 
Registers, DIGITAL UNIX usage, 1-1 


retsys (PALcode) instruction, 2-17 | 
PS with, 5-2 

rti (PALcode) instruction, 2-18 
PS with, 5-2 
with exceptions, 5-1 





Ss 


SCE bit, machine check error summary register, 5-8 

Seg0, mapping of, 3-1 

Segl, mapping of, 3-1 

Software completion bit, exception summary register, 
5-6 

Stack frames, 5-3 


Stack pointer (SP) register 
defined, 1-4 
linkage for, 1-1 
SWC bit, exception summary register, 5-2, 5-6 
swpctx (PALcode) instruction, 2-19 
PCB with, 4-2 
with ASNs, 3-10 
swpipl (PALcode) instruction, 2-21 
PS with, 5-2 
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swppal (PALcode) instruction, 2—22 
Synchronous traps, 5-2 

System call entry (entSys) register, 1-3, 5-4, 5-9 
System entry addresses, 5—4 

System value (sysvalue) register, 1-4 

Sysvalue. See System value 


+ 


tbi (PALcode) instruction, 2—24 
with TBs, 3-10 
Translation 
physical, 3-7 
virtual, 3-9 
Translation buffer (TB), 3-10 
Translation not valid fault, 3-12 
Trap shadow, 5-2 


Trigger instruction, 5-2 


U 


Unaligned access fault 

system entry for, 5—4 
Unaligned fault entry (entUna) register, 1-3, 5-9 
Underflow trap, 5-5 


UNF bit, exception summary register, 5—5 





Unique, process unique value, 1-4 
User read enable (URE), bit in PTE, 3—4 


User stack pointer (USP) register 


defined, 1-4 
in process context, 4-1 


User write enable (UWE), bit in PTE, 3-4 
USP. See User stack pointer 


V 


Valid (V), bit in PTE, 3-5 
vaSize, 1-2 

Virtual address space, 3-1 
Virtual address translation, 3-9 
Virtual format, 3-2 


Virtual page table pointer (VPTPTR), 1-4 
with address translation, 3-9 
VPTPTR. See Virtual page table pointer 


W 


whami (PALcode) instruction, 2-25 


whami, current processor number, 1—4 
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wrent (PALcode) instruction, 2-26 
wrfen (PALcode) instruction, 2—27 
wripir (PALcode) instruction, 2-28 
wrkgp (PALcode) instruction, 2—29 
wrmcees (PALcode) instruction, 2-30 
wrperfmon (PALcode) instruction, 2-31 
wrunique (PALcode) instruction, 2-8, 2-9 
wrusp (PALcode) instruction, 2-32 
wrval (PALcode) instruction, 2-33 
wrvptptr (PALcode) instruction, 2—34 
wtint (PALcode) instruction, 2~35 
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This section describes how a particular implementation of the Windows NT Alpha operating 
system relates to the Alpha architecture. It is important to note the following: 


The interfaces described in this section will change as necessary to support the 
Microsoft Windows NT operating system. 


Effectively, many of the interfaces described in this section are private agreements 
between the PALcode and the kernel. Other software should not assume that those 
interfaces are available. 


In particular, the interfaces in this section must not be used by software developers who 
are writing device drivers; instead use the portable Windows NT device driver inter- 
faces. 


The only interfaces in this section that may be used by nonsystem software are the bpt, 
rdteb, and gentrap PALcode instructions. 


The following chapters are included in this section: 


Chapter 1, Introduction to Windows NT Alpha Software (II-C) 
Chapter Processor, Process, Threads, and Registers (II-C) 
Chapter 3, Memory Management (II-C) 


Chapter 


2, 
3, 
Chapter 4, Exceptions, Interrupts, and Machine Checks (II-C) 
5, PALcode Instruction Descriptions (II—-C) 
6, 


Chapter 6, Initialization and Firmware Transitions (II-C) 
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Chapter 1 


Introduction to Windows NT Alpha Software (II—C) 


The primary goal of the Windows NT Alpha PALcode implementation is total compatibility 
with the base operating system design and existing implementations of Windows NT for all 
processor architectures. Maintaining compatibility with Windows NT and software portability 
between versions of Windows NT requires the stipulations mentioned in the introduction to 
this section. It is important that all software developers read those stipulations. 


The PALcode mechanism, coupled with the Windows NT Alpha design, provides binary com- 
patibility for native system components across different processor implementations. The 
PALcode also provides a clean abstracted processor model that matches Windows NT require- 
ments, requires minimal porting effort for new platforms, and provides the best possible 
performance while offering those features. 


Windows NT Alpha is a 32-bit operating system. Therefore, the PALcode is a 32-bit imple- 
mentation, with, for example, a 32-bit virtual address space. The internal processor registers 
are 32 bits, in canonical longword format. The page table entry (PTE) format is also 32 bits. 
The PALcode manages any required transformation between the 32-bit processor-independent . 
formats and the 64-bit internal processor. 


A Windows NT Alpha PALcode image is processor specific and platform independent. A sin- 
gle version of the PALcode (for a particular processor implementation) runs on all systems. 
The difference between processors is entirely hidden by the PALcode for each implementa- 
tion. Thus, the PALcode interface allows the Windows NT Alpha operating system images to 
be binary-compatible across different processor implementations. 


The PALcode image is read from the disk during the boot process, like all other components 
of the running operating system. The boot environment PALcode need only support the com- 
mon swppal instruction to allow the operating system to load and initialize the PALcode. 


Some functions and parameters must be implemented on a per-platform basis. Platform-depen- 
dent functions are implemented in the HAL (hardware abstraction layer), which is a 
system-specific library, loaded and dynamically linked at boot time. 


The basic Windows NT Alpha design, therefore, consists of a platform-independent PALcode 
definition and binary-compatible kernel with system-dependent functions in the HAL. 


The PALcode was designed to work smoothly and quickly with the Windows NT Alpha ker- 
nel. For example, the PALcode builds Windows NT Alpha trap frames and passes Windows 
NT Alpha status codes. Wherever possible, parameters and return values are passed in regis- 
ters between the kernel and the PALcode. 
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The PALcode was also designed to keep dependencies on the kernel to a minimum. For exam- 
ple, only the processor control region and the kernel trap frame definition are shared between 
the PALcode and the Windows NT Alpha kernel. 


1.1 Overview of System Components 


The kernel is a binary-compatible image that can run on any Alpha processor, platform, or sys- 
tem. The kernel is binary compatible because of cooperation between it and other system 
components that provide the processor- and system-specific functions. Those cooperating com- 
ponents are the firmware, the OS Loader, the HAL (hardware abstraction layer), and the 
PALcode. 


The firmware and OS Loader are the first components in the boot sequence and are responsi- 
ble for establishing the environment in which the kernel, HAL, and PALcode execute. The 

kernel reads the configuration information provided by the firmware through the OS Loader 
and uses the standard interfaces provided by the HAL and the PALcode. 


Firmware 


The firmware contributes the following components to the boot sequence: 


1. Establishes the privileged environment in which the OS Loader executes and the kernel 
begins executing (that is, provides memory management support and the swppal 
instruction). 


2. Provides platform- and configuration-dependent services to the OS loader (such as I/O 
services) by using ARC call-back routines. 


Creates the configuration database: devices, memory size, and so forth. 


4. Reads the OS Loader from the disk and executes it. 


OS Loader 


The OS Loader is a linking loader that reads the component operating system images from the 
disk, performs necessary relocation, and binds the dynamically linked images together. The 
OS Loader loads the appropriate HAL and PALcode, based on the configuration information 
provided by the firmware. 

The OS Loader loads the appropriate boot drivers as read from the operating system configura- 
tion files. The OS Loader also builds the loader parameter block structure by using 
information provided by the firmware. The loader parameter block includes configuration 
information (processor, system, device, and memory configuration) and per-processor data 
structures. 


Once the operating system components are loaded, the OS Loader jumps to the beginning of 
the kernel to begin execution of the operating system. The OS Loader loads the operating sys- 
tem PALcode on a 64K-byte-aligned address. The kernel activates the operating system 
PALcode by executing the swppal instruction. 
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Hardware Abstraction Layer (HAL) 


The HAL provides the system-specific layer between the kernel and the system hardware. The 
HAL provides interfaces for the following types of functions: 


Interrupt handling, including dispatch and acknowledge 
DMA control 

Timer support 

Low-level I/O support 


Cache coherency 


If a processor implementation requires PALcode intervention to support any of those func- 
tions, then the PALcode must support those processor-specific functions in a 
system-independent manner. 


PALcode 


The PALcode is specific to a particular processor implementation and must hide the internal 
workings of the processor from the kernel. The PALcode for a particular processor may 
include per-processor functions, but they must be called only by the HAL. 


1.2 Calling Standard Register Usage 


Table 1-1: General-Purpose Integer Registers 


Register Number Symbolic Name _ Volatility Description 


r0 vO Volatile Return value register 

rl —r8 tO —t7 Volatile Temporary registers 

r9 -rl4 . sO —s5 Nonvolatile Saved peer 

r15 s6/fp Nonvolatile Saved register/frame pointer 
r16 —121 a0 — a5 Volatile Argument registers 

122 —125 t8 —tll Volatile Temporary registers 

126 ra Volatile - Return address register 

127 t12 Volatile Temporary register 

128 at Volatile Assembler temporary register 
129 gp Nonvolatile Global pointer 

130 sp Nonvolatile Stack pointer 

r31 zero Constant RAZ / writes ignored 
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Table 1-2: General-Purpose Floating-Point Registers - 


Register Number __ Volatility 


f0 Volatile 

fl Volatile 

f2 — f9 Nonvolatile 
f10 —f15 Volatile 
f16 —f21 Volatile 
f22 — £30 Volatile 

f31 | Constant 


1.3 Code Flow Conventions 


Description 


Return value register (real part) 
Return value register (imaginary part) 
Saved registers 

Temporary registers 

Argument registers 

Temporary registers 


RAZ / writes ignored 


The code flows are shown as an ordered sequence of instructions. The instructions in the 
sequence may be reordered as long as the results of the sequence of instructions are not 
altered. In particular, if an instruction j is listed subsequent to an instruction i and i writes any 
data that is used by /, then i must be executed before j. 
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1.4 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 
2. Windows NT AXP —> Windows NT Alpha 


Revision 6.0, December 12, 1994 
1. Created from Windows NT AXP specifications\ 
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Chapter 2 


Processor, Process, Threads, and Registers (II—C) 


This chapter describes structures and registers that support the processor, process, and thread 
environment. 


2.1 Processor Status 


The processor status register (PSR) defines the processor status. The PSR is shown in Figure 
2—1 and described in Tables 2—1, 2—2, and 2-3. 


Figure 2-1: Processor Status Register 


31 543210 


M 
RAZ/IGN IRQL LB 
E 


Table 2-1: Processor Status Register Fields 


Field Type Description 


IRQL RW Interrupt request level, in the range 0-7, as described in Table 2-2. 
Any interrupt disabled at a lower priority level is also disabled at a 
higher priority level. 


IE RW Interrupt enable: 
0 = interrupts disabled 
1 = interrupts enabled 
A global interrupt enable to turn interrupts on and off without changing 
the IRQL. 


MODE RW Processor mode: 
0 = kernel mode 
1 = user mode 
Describes the current processor privilege mode: user (unprivileged) or 
kernel (privileged). The processor privilege mode defines the instruc- 
tions that can be executed and the memory protection that is used, as 
described in Table 2-3. . 
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Table 2-2: Processor Status Register IRQL Field Summary 


IRQL Name Description 

0 PASSIVE_LEVEL All interrupts enabled. 

1 APC_LEVEL APC software interrupts disabled. 

2 DISPATCH_LEVEL —_ Dispatch software interrupts disabled. 

3 DEVICE_LEVEL Low-priority device hardware interrupts disabled. 
4 DEVICE_HIGH_LEVEL High-priority device hardware interrupts disabled. 
5 CLOCK_LEVEL Clock hardware interrupts disabled. 

6 IPI_LEVEL Interprocessor hardware interrupts disabled. 

7 HIGH_LEVEL All maskable interrupts disabled. 


Table 2-3: Processor Privilege Mode Map 


Operation Privileged Unprivileged 

Superpage access Yes No | 

Page protection _ Access to Access to only those pages with 
all pages the Owner bit = 1 

Privileged PALcode instructions Yes No 


2.2 Internal Processor Register Summary 


The internal processor registers in Table 2—4 are defined across all implementations. Imple- 
mentation of these registers within the processor is implementation dependent. 


Table 2-4: Internal Processor Register Summary 


Name Initial Value 
ASN 0 
GENERAL_ENTRY 0 
IKSP 0 
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Description 


Address space number of owning process of 
current thread 


General exception class kernel handler 
address 


Initial kernel stack pointer 


Interrupt exception class kernel handler 
address 


Kernel global pointer 


Machine check error summary 
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Table 2-4: Internal Processor Register Summary (Continued) 


Name Initial Value Description 
handler address 
PAL BASE { PALcode image base address 
PANIC_ENTRY 0 Panic exception class kernel handler address 
PCR t Processor control region base address 
PDR 0 Page directory base address 
PSR t Processor status register 
RESTART_ADDRESS t Restart execution address 
SIRR 0 Software interrupt request register 
SYSCALL_ENTRY (0 System service exception class kernel handler 
address 
TEB 0 Thread environment block base address 
THREAD 0 Thread unique value (Kernel thread address) 


* The register has an architected initial value. See the register description in Table 2-5. 
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2.3 Internal Processor Registers 
Table 2—5 lists and describes the internal processor registers. 
Table 2-5: Internal Processor Registers 


Name Description 


ASN -Address space number of owning process of current thread 


Bits <15:0> of the ASN register contain the address space 
number for the current process. Bits <31:16> are RAZ. 


The ASN is a process tag that is used by the processor to qual- 
ify each virtual translation. When translations are qualified, it 
is not necessary for the processor to flush all virtual transla- 
tions for previous processes when performing a context swap 
or process swap. The swpctx and swpprocess instructions pro- 
vide the ASN. 


GENERAL ENTRY General exception class kernel handler address 


The GENERAL_ENTRY register contains the entry address 
(in 32-bit superpage format) for the kernel exception handler 
for the General class of exceptions. The wrentry instruction 
writes GENERAL_ENTRY. 


IKSP Initial kernel] stack pointer 


The IKSP register contains the initial kernel stack address. 
IKSP points to the top of the kernel stack for the currently exe- 
cuting thread. The rdksp instruction reads IKSP and the 
swpksp instruction writes IKSP. IKSP is also written by 
swpctx and during system initialization by initpal. 


INTERRUPT ENTRY Interrupt exception class kernel handler address 


The INTERRUPT_ENTRY register contains the entry address 
(in 32-bit superpage format) of the kernel exception handler 
for the Interrupt class of exceptions. The wrentry instruction 
writes INTERRUPT_ENTRY. 


KGP Kernel global pointer 


The KGP register contains the kernel global pointer, the gp 
value. The PALcode restores the kernel global pointer to the 
general-purpose register gp whenever dispatching to a kernel 
exception handler. The initpal instruction writes the KGP. 
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Table 2-5: Internal Processor Registers (Continued) 


Name Description 


MCES Machine check error summary 


The MCES register is used to report and control the current 
state of machine check handling. The MCES register contains 
multiple fields that are described in Section 4.3. The initial val- 
ues for the MCES register fields DSC, DPC, and DMK are 
implementation specific, and all other fields set to 0. The rec- 
ommended initial values are DMK = 0, DPC = 1, and DSC = 1. 


MEM_MGMT_ENTRY Memory management exception class 


The MEM_MGMT_ENTRY register contains the entry 
address (in 32-bit superpage format) of the kernel exception 
handler for the Memory Management class of exceptions. The 
wrentry instruction writes MEM_MGMT_ENTRY. 


PAL_ BASE PALcode image base address 


The PAL_BASE register contains the physical address of the 
base of the currently active PALcode image. Its initial value is 
the address of the PALcode entry point. PAL_BASE controls 
which PALcode image is currently active and is written during 
PALcode initialization. The PAL_BASE register is illustrated 
and described in Section 6.2. 


PANIC ENTRY Panic exception class kernel handler address 


The PANIC_ENTRY register contains the entry address (in 
32-bit superpage format) of the kernel exception handler for 
the Panic class of exceptions. The wrentry instruction writes 
PANIC_ENTRY. 


PCR Processor control region base address 


The PCR register contains the base address (in 32-bit super- 
page format) of the processor control region page. The proces- 
sor control region is a page of per-processor data. The PCR is 
passed as an initialization parameter and the rdpcr instruction 
reads it. 
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Table 2—5: Internal Processor Registers (Continued) 


Name - Description 


PDR Page directory base address 


The PDR register contains the base physical address of the 
page directory page. The page directory page contains all of 
the first-level page table entries (the page directory entries or 
PDEs). As such, the page directory page defines an address 
space for a process. The swpctx and swpprocess instructions 
write the PDR when the address space is swapped. The initpal 
instruction also writes the PDR. 


PSR Processor status register 


The PSR controls the privilege state and interrupt priority of 
the processor. The PSR register contains multiple fields that 
are described in Section 2.1. The initial values for the fields in 
the PSR are IRQL=7, IE=1, and MODE=0 (kernel). 


RESTART_ADDRESS Restart execution address 


The RESTART_ADDRESS register contains ‘the address 
where the processor resumes execution when the PALcode 
exits. For example, upon entry to each of the PALcode instruc- 
tions, the RESTART_ADDRESS register contains the virtual 
address + 4 of that instruction. The initial value of the 
RESTART_ADDRESS register is the kernel initialization con- 
tinuation address, passed as a parameter to the initialization 
routine. 


SIRR Software interrupt request register 


The SIRR register indicates requested software interrupts. 
SIRR contains multiple fields that are defined in Section 4.2.7. 


SYSCALL_ENTRY System service exception class kernel handler address 


The SYSCALL_ENTRY register contains the entry address 
(in 32-bit superpage format) of the kernel exception handler 
for the System Service class of exceptions. The wrentry 
instruction writes SYSCALL_ENTRY. 


TEB Thread environment block base address 
The TEB register contains the address of the user thread envi- 


ronment block. Each swpctx instruction writes the TEB; the 
rdteb instruction reads it. 
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Table 2-5: Internal Processor Registers (Continued) 


Name Description 


THREAD Thread unique value (kernel thread address) 


The THREAD register contains the address of the currently 
executing kernel thread structure. Each swpctx instruction 
writes the THREAD register; the rdthread instruction reads it. 


2.4 Processor Data Areas 


The operating system per-processor data structure is the processor control region. The proces- 
sor control region is a one-page (superpage) data structure that stores information that may be 
specific to a particular architecture. This information is data that is shared between the PAL- 
code, the HAL, and/or the architecture-specific portions of the kernel. See Section 3.1 for 
information on the superpage. 


2.4.1 Processor Control Region 
The processor control region contains a number of data structures that are of importance to the 
PALcode, including: 


¢ A 3064-byte region that is reserved for the PALcode and is the only per-processor data 
region available to the PALcode. 


¢ The interrupt level table (ILT), which maps the interrupt enable masks for each possible 
interrupt request level. The PALcode may continually read these masks or may read 
them once and cache them inside the processor. 


¢ The interrupt dispatch table (IDT), which contains the address of an interrupt handler 
for each possible interrupt vector. 


e The interrupt mask table IMT), which maps each possible pattern of interrupt requests 
to the highest priority interrupt vector and the corresponding synchronization level. 


e The panic stack pointer. 
e = The restart block pointer. 
© The firmware restart address. 
The PALcode is responsible for initializing the PALcode base address field and several PAL- 


code revision fields within the processor control region. 


The rdpcr instruction returns the base address of the processor control region. 


2.4.2 PALcode Version Control 


The PALcode is responsible for writing version information in the processor control region. 
The PalMajorVersion, PalMinorVersion, and PalSequenceVersion are provided for mainte- 
nance and debugging. The PALcode writes these fields, but the values are implementation 
specific. 
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The kernel may use the PalMajorSpecification and PalMinorSpecification fields for 
check-pointing with the PALcode. 


The PALcode writes the specification fields with version numbers that correspond to the ver- 
sion of the specification to which the PALcode image complies. Minor revisions within the 
same major revision are backwards compatible. The kernel may read the PalMajorSpecifica- 
tion and determine if it is compatible with the version of the PALcode. If the kernel is not 
compatible (if the PalMajorSpecification is greater than the kernel’s aa PALcode major 
specification), the kernel runs down in a controlled manner. 


The version agreement between the PALcode and the kernel is a private agreement between 
these two system components. No other system component, including the HAL and device 
drivers, may depend on any values from those fields. 


2.4.3 PALcode Alignment Fixup Count 


PALcode must maintain a count in the processor control region PalAlignmentFixupCount 
field of the total number of alignment fixups that the PALcode accomplishes. PalAlignment- 
FixupCount is an unsigned quadword field that is incremented by one when the PALcode 
fixes up an alignment fault. The field silently overflows to zero. 


The kernel may use the PalAlignmentFixupCount field for determining the total number of 
alignment fixups on a system by adding the value in that field for each processor to the num- 
ber of alignment fixups done by the kernel. 


2.5 Caches and Cache Coherency 


Implementations may include caches that are not kept coherent with main memory. The imb 
instruction provides an architected common way to make the instruction execution stream 
coherent with main memory. The imb instruction guarantees that subsequently executed 
instructions are fetched coherently with respect to main memory on only the current processor. 


User-mode code that directly modifies the instruction stream, either through writes or by 
DMA from an I/O device, must call the appropriate Windows NT API to ensure I-cache coher- 
ency. User-mode code that uses standard APIs to modify the instruction stream works as 
expected and is handled by the APIs themselves. 


\The PALcode does not architect a method for future processors to make potentially incoher- 
ent data streams coherent. The first implementation processor maintains data stream 
coherency. For any systems that require the ability to make the data stream coherent or to 
flush a write-back cache, native code can be implemented via the standard HAL interfaces. 
However, if a future implementation (for whatever reason) does not permit native code to be 
able to either make the data stream coherent or flush internal write-back data caches, that 
implementation must provide a PALcode instruction to do so. The interface to such a PAL- 


non avaotame that uoe een | ae ee tates 
code instruction affects the HAL only for thosc systems that use the implementation. \ 
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2.6 Stacks 


There are four stacks: 
e Kernel stack 


Each thread is allocated its own pages for a kernel stack. The kernel stack is the two 
pages of virtual address space below the IKSP for a thread, where the IKSP points to 
the byte beyond the top of the two pages. The initial kernel stack pointer (IKSP) 
points to the top of the currently active kernel stack for the current thread. Two 
PALcode instructions provide access to the IKSP: rdksp to read the IKSP and swpksp 
to atomically read the current IKSP and write a new one. 


Must remain valid for the currently executing thread. Software must guarantee that the 
kernel stack pointer remains 16-byte aligned. 


e User Stack 
A per-thread stack on which all user-mode components are executed. 
e Deferred procedure call (DPC) stack 


A processor-wide stack upon which all deferred procedure calls are executed. Must 
remain valid for the lifetime of the system. 


© Panic stack 


Allows the operating system to remain coherent through a system crash. Must remain 
valid for the lifetime of the system. 


The kernel, DPC, and panic stacks execute in kernel mode; the user stack executes in user 
mode. 


2.7 Processes and Threads 


Windows NT Alpha is designed as a multithread operating system with multiple threads exe- 
-cuting within the same process. Each thread has its own processor context, user-mode stack, 
and kernel stack. Memory and the address space are shared across all threads in the same 
process. 


The PALcode "knows" nothing about the structure of threads or processes. The PALcode 
implements the means to swap from one thread context to another and to allow a thread to 
attach to the address space of another process. 


The state to accomplish these operations is passed entirely in registers. The PALcode main- 
tains the THREAD and TEB internal processor registers. They allow threads to query about 
the state of the currently executing thread. 


The THREAD register, a unique value identifying the current thread, is written when the 
thread context is swapped. The privileged instruction rdthread reads the THREAD register. 


The TEB register, a user-accessible pointer to the thread environment block for the new 
thread, is written when thread context is swapped. The unprivileged rdteb instruction reads the 
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TEB register. Again, the PALcode knows nothing about the structure of the thread environ- 
ment block; the PALcode simply maintains the TEB register value when context is switched. 


2.7.1 Swapping Thread Context to Another Thread 


The swpctx instruction swaps the context from one thread to another thread. The following 
parameters are passed to swpctx: 


Initial kernel stack pointer . 
Swpctx must switch to the new kernel stack for the new thread. The initial kernel stack pointer 
is written to the internal processor register IKSP. 


THREAD internal processor register (unique thread value) 


TEB internal processor register (thread environment block pointer) 

These registers are maintained by the kernel and only written during a context switch. Implic- 
itly, the values in these registers for a particular thread cannot change while that thread is 
executing. 


PFN of the directory table base page for the new process 

ASN for the new process 

ASN_wrap_indicator 

The PFN and ASN allow switching to a new process address space. The PFN of the directory 


table base page is an overloaded parameter; it is used to indicate if the process needs to be 
swapped. 


e¢ The PEN is set to a negative value in the kernel if the previous thread and the new 
thread are in the same process (address space). There is no need to swap the address 
space if the two threads are in the same process. The values for the ASN parameters are 
then UNPREDICTABLE. 


e If the two threads are in different processes, the PFN is greater than or equal to zero and 
is used to write the PDR internal processor register. When the PFN is valid (greater than 
zero), the ASN must also be valid and is used to write the ASN internal processor regis- 
ter. . 


Swapping to a new process address space involves establishing a new directory pointer to the 
page table base page for the new process and possibly performing translation buffer opera- 
tions. A set ASN_wrap_indicator signals that the PALcode must perform an invalidation 
operation for each cached translation in the translation buffers and virtual caches that does not 
have the address space match (ASM) bit set. 


2.7.2 Swapping Thread Context to Another Process 


The swpprocess (swap process) instruction allows a thread to attach to another process (in 
another address space). Swpprocess requires the PFN of the new directory table base page and 
the new ASN as input. Swpprocess performs the same address space swapping operation as 
does swpctx when the PFN of the page directory is valid. 
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2.8 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP ——> Alpha 
Windows NT AXP ——> Windows NT Alpha 
Updated for NT 4.0 
Removed ISP and ISP flag 


a ge ae 


Revision 6.0, December 12, 1994 
1. Created from Windows NT AXP specifications\ 
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Chapter 3 


Memory Management (II—C) 


3.1 Virtual Address Space 


Windows NT Alpha is a 32-bit implementation with a 32-bit virtual address space, as repre- 
sented in Table 3-1. | 


Table 3-1: Virtual Address Map 


Address Rangej¢ (32 bits) Permission Description 


00000000—7 FFFFFFF User and Kernel General user address space 

80000000—BFFFFFFF Kernel | | Noiimapoed kernel space (32-bit 
superpage) 

CO0000000-C 1 FFFFFF Kernel Mapped, page table space 

C2000000-FFFFFFFF Kernel Mapped, general kernel space 


The address map takes advantage of the 32-bit superpage feature of the Alpha architecture. If 
the implementation of the 32-bit superpage is not done in hardware, it must be implemented in 
software (PALcode). The entire 1-GB address space mapped by the 32-bit superpage must be 
valid at all times for both instruction fetch and data access. 


Implementation Note (Hardware): 


It is strongly recommended that implementations include a hardware mapping of the 
32-bit superpage for both instruction and data stream. 


3.2 I/O Space Address Extension 


The Windows NT Alpha kernel implementation takes advantage of the architecture’s 64-bit 
address space to provide a nonmapped extended address for I/O space. The extended address 
space uses the 43-bit superpage that is available in the Alpha architecture. The superpage 
allows kernel mode access to an address space with a predetermined translation. Therefore, 
those accesses never require page table mapping or cause a translation buffer miss. 
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Implementation Note: 


The extended address space is particularly important to Alpha implementations that do not 
include the BWX extension, because the bus mapping scheme for those implementations 
uses a shifted physical address, where the lower address bits are used to determine the 
byte enables. Therefore, the effective page size is smaller. See Appendix D for 
information about the BWX extension. 


The extended superpage provides nonmapped access to a 41-bit physical address space. The 
nonmapped superpage I/O accesses provide Alpha systems with a performance advantage 
because there is no need to write as many page table entries and to fill as many translation 
buffer misses as would be necessary without it. The extended address space is desirable 
because the likely physical address space is 34 bits or more and the 32-bit superpage can only 
allow accesses to 30 bits of physical address space. The extended address space is the only 
exception to the 32-bit virtual address map shown in Table 3—1. The extended address space is 
intended for I/O access only and can only be used in kernel mode. The address mapping for 
the extended address space is shown in Table 3-2. 


Table 3-2: I/O Address Extension Address Map 


Address Range;¢ (64 bits) Permission Description 
FFFFFC0000000000— “Kernel Nonmapped kernel mode I/O exten- 
FFFFFDFFFFFFFFFF sion 


3.3 Canonical Virtual Address Format 


All virtual addresses, with the exception of the large superpage addresses, must be in canoni- 
cal longword form. The PALcode must check the faulting virtual addresses in the first level 
miss flows and raise an exception if the addresses are not canonical longwords. The check is 
required because the processor may generate 64-bit addresses that are not canonical long- 
words, but the common memory management code only knows about 32-bit addresses and so 
cannot necessarily identify or signal the exception to the offending code. The PALcode cannot 
simply resolve the miss by using only the lower 32 bits. When the faulting instruction is 
re-executed, it attempts again to access the noncanonical address. If a virtual address fails the 
canonical form test, the PALcode raises a general exception (see Section 4.1.7.) 


3.4 Page Table Entries 


Page table entries (PTEs) provide the translation from virtual addresses to their physical 
addresses. The PTE includes the physical address in the form of a page frame number (PFN), 
protection information, and performance hints. The virtual address is related to a page table 
entry based solely upon the position of the PTE within a set of page tables. © 

Two methods may be used to traverse the page tabies to retrieve the corresponding PTE for a 
given virtual address. The first is to view the page tables as a single-level virtually contiguous 
table. The second is to view the page tables as a two-level physical table. 
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3.4.1 Single-Level Virtual Traversal of the Page Tables 


For a single-level virtual traversal, a virtual address must be viewed as shown in Figure 3-1, 
where 2**N is the implementation page size: 


Figure 3-1: Virtual Address (Virtual View) 


31 N N-1 0 


Virtual Page Number (VPN) | Byte offset within page 


To access the corresponding PTE for a VA (virtual address) using the single-level virtual 
method, use the following algorithm. 


! In the algorithm: 
! VIRTUAL_PTE BASE = C0000000;¢ 
! PAGE SHIFT = N 
! Clear upper bits in case va is sign-extended: 
va «BYTE ZAP( va, FO ) 
! Get virtual page number: 
vpn <-RIGHT SHIFT( va, PAGE SHIFT ) 
! 4 bytes per pte, offset + base: 
pte_va < VIRTUAL PTE BASE + ( vpn * 4) 
! Do a virtual load of pte: 
pte < (pte va) 


3.4.2 Two-Level Physical Traversal of the Page Tables 


The two-level physical method can be used to find the corresponding PTE for a virtual address 
when the virtual access method cannot be used (for example, if the PTE address is not valid). 
The key to physically traversing the page tables is the PDR internal processor register. The 
PDR is maintained on a per-process basis whenever process context is swapped. The PDR is 
the physical address of the page directory page that forms the first level of the page tables. The 
first level of the page tables easily fits within a single page. Each entry in the page directory 
page is called a PDE (page directory entry). One PDE maps one page of PTEs. 


A virtual address must be viewed as shown in Figure 3-2 for a two-level, physical traversal of 
the page tables. In Figure 3-2, 2**N is the implementation page size, and 2**P is (PTEs per 
page = page size / 4). 


Figure 3-2: Virtual Address (Physical View) 


31 N+P N+P-1 N N-1 0 
Page Directory Page Table Byte offset 
Index (PDI) Index (PTI) within page 


The following algorithm uses the two-level physical traversal method to access the correspond- 
ing PTE for a VA (virtual address). . 
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! In the algorithm: 
! PDE SHIFT = N+P 
! PAGE SHIFT = N 


! Clear upper bits in case va is sign-extended: 
va <«- BYTE ZAP( va, FO ) 
! Get pde number: 
pde_index «RIGHT SHIFT( va, PDE SHIFT ) 
! 4 bytes per pde, index * 4 byte offset: 
pde_ offset <« pde index * 4 
! Offset + base: 
pde_pa <- PDR + pde offset 
! Do a physical load of the page directory entry: 
pde <- (pde_pa) 
! Get PFN of pte page from pde: 
pte_pfn < pde<PFN> 
! Get physical address of pte page: 
pte_page < LEFT SHIFT( pte _pfn, PAGE SHIFT) 
! Extract page table index from virtual address: 
pte index < va<pti> 
! Calculate offset, 4 bytes per pte: 
pte offset <« pte index * 4 
! Address base + offset: 
pte_pa < pte page + pte offset 
! Do a physical load to read the pte: 
pte < (pte _pa) 


Page directory entries are themselves page table entries and so they have the same format. 
There are some implications for DTB implementation because the PDEs establish a recursive 
mapping for addresses within the PTE address space. The implications and a description of the 
recursive mapping are described in Section 3.6. 


3.4.3 Page Table Entry Summary 


The format for a PTE is shown in Figure 3—3 and described in Table 3-3. 


Figure 3-3: Page Table Entry 


31 


9876543210 





Table 3-3: Page Table Entry Fields 


Field 


PEN 
SFW 
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Description 


Page frame number 


Reserved for software (operating system) 
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Field 
GH 


Description 


Granularity hints 

Optional hint that provides for mapping translations larger than the standard 
implementation page size. These large pages must be both virtually and physi- 
cally aligned. Defines the translation in terms of a multiple of the page size, 
where the multiplier equals 8**N, where N is the granularity hint value in the 
range 0-3. 


Global translation hint (address space match) 
Optional hint that the indicated translation is global for all processes. 


Reserved 


Dirty: 

0 = page is not dirty 

1 = page is dirty 
Implemented as the inverse of fault on write (FOW). Serves double duty by 
causing faults for the first write to a page. Serves as a write-protect bit and as a 
marker that allows the operating system to track dirty pages. 


Owner: 

0 = kernel access only 

1 = user access permitted 
Indicates whether user mode is allowed across this page, either for instruction 
fetch or data access. Kernel mode code has implied access to all pages that have 
a valid translation. 


Valid: 
O = translation not valid 
1 = valid translation 


3.5 Translation Buffer Management 


As shown in Table 3-4, the PALcode provides the dtbis, tbia, tbim, tbimasn, tbis, and tbisasn 
instructions to manage the cached virtual translations maintained in the translation buffers and 
virtual caches. 


Table 3-4: Translation Buffer Management Instructions 


Instruction Operation 


dtbis 
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Invalidates a single data stream translation for a specified address. It is 
designed for those cases when the operating system can determine that the 
translation is not used in the instruction stream. Implementations may 
advantageously use dtbis to avoid needing to invalidate instruction stream 
translations in both an instruction TB and a virtual I-cache. 
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Table 3-4: Translation Buffer Management Instructions (Continued) 


Instruction Operation 


tbia — Invalidates all page table translations for both instruction and data stream 
access. The translations invalidated are limited to "page table translations" 
because it is possible that an implementation has used fixed TB entries to 
implement one or more of the required superpages. These fixed translations 
are considered "hard-wired" by the operating system and must be valid at 
all times. 


tbim Invalidates multiple virtual translations, passed as a parameter, for the cur- 
rent ASN. Tbim invalidates translations for both instruction and data 
stream access. 


tbimasn Invalidates multiple virtual translations for a specified address space num- 
ber (ASN), passed as a parameter. The ASN may or may not be the cur- 
rently executing thread. Tbimasn invalidates translations for both 
instruction and data stream access. 


tbis Invalidates a single translation for a specific virtual address, passed as a 
parameter. Tbis invalidates the translation for both instruction and data 
stream access. 


-tbisasn Invalidates a translation for a single virtual address for a specified address 
space number (ASN). The ASN may or may not be for the currently exe- 
cuting thread. Tbisasn invalidates the translation for both instruction and 
data stream access. 


On processors that implement physical, noncoherent instruction caches, instructions that invali- 
date I-stream translations must also invalidate instruction cache blocks from the physical 
pages that correspond to the invalidated virtual translations. 


3.6 Implications of Recursive TB Mapping 


Recursive virtual mapping has an implication for data translation buffer implementations: it 
is possible for two identical translations to be written in the DTB during the same miss han- 
dling sequence. If the DTB cannot correctly operate with two identical translations, the 
PALcode must include additional checks to prevent the condition from occurring. 


The page tables can be viewed either as a virtual contiguous single-level table or as a 
two-level table that must be traversed physically. When viewed as a two-level table, the first 
level is a single page called the page directory page. Each page directory page entry, called a 
PDE, provides the first-level translation so that the TB-fill code can find the page table page 
that contains the PTE with the translation for the faulted virtual address. All page table pages 
are mapped by a PDE in the page directory page. 


The page tables are recursive. The page directory page is a standard page table page and it is 
virtually mapped in the single-level virtual page table. Therefore, there exists one PDE that 
maps the page directory page. The PDE that maps the page directory page in a two-level 
lookup is also the PTE that maps the page directory page for the single-level virtual mapping. 


DIGITAL Restricted Distribution 
3-6 Windows NT Alpha Software (Il-C) 


This special PDE is called the root PTE or RPTE. 


Assume that the processor implementation has two data stream TB miss flows — one for the 
misses taken in native mode and one for the misses taken in the PALcode environment. For 
the case when a native-mode virtual access is made to the page directory page, PALcode takes 
the following flows: . 


Native Miss Flow PALcode Environment Miss Flow 


1. {get va for PTE that maps 
the faulted va: VA} 


2. {get the PTE using its va} 


Idl_ rx, O(ry) 
where ry <va of PTE 


3. {Idl rx, O(ry) from . 
PALcode environment faulted } 


4. {resolve this fault by making the va 
of the missed PTE valid} 


5. {translation for RPTE is written 
into the DTB} 


6. {re-execute the load that failed 
since the va of the PTE is now valid} 


7. load completes, rx «— RPTE} 


8. {write the translation for the 
faulting va, VA, into the DTB} 


9. { RPTE is now in the DTB twice} 


10. {Re-execute the original native-mode 
instruction that faulted when 
accessing VA} 


Since there is only one PTE, RPTE, that exhibits this behavior, the PALcode can check the 
faulting PTE address in the second-level fill routine to special case for RPTE. It is preferable 
not to slow down even the second-level fill flow. However, this is a processor implementation 
decision. 
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3.7 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP ——> Alpha 
2. Windows NT AXP —> Windows NT Alpha 
3. Updated for NT 4.0 
4. Add TBIM and TBIMASN information 


Revision 6.0, December 12, 1994 
_1. Created from Windows NT AXP specifications\ 
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Chapter 4 


Exceptions, Interrupts, and Machine Checks (II-C) 


At certain times during the operation of a system, events within the system require the execu- 
tion of software outside the explicit flow of control. When such an exceptional event occurs, 
an Alpha processor forces a change in control flow from that indicated by the current instruc- 
tion stream. The notification process for such events is an exception, an interrupt, or a 
machine check. 


4.1 Exceptions 


4.1.1 Exception Dispatch 


When the processor encounters an exception, it traps to PALcode that provides preliminary 
exception dispatch for the operating system. Some exceptions, such as TB miss, may be han- 
dled entirely by the PALcode without the intervention of the operating system. 


The PALcode provides a simple and efficient method of dispatching to the operating system 
for those exceptions that require operating system action. In general, the following operations 
characterize exception dispatch: 


1. Switch to kernel mode (if in user mode). 

2. Allocate a trap frame on the kernel stack. 

3. Save the necessary processor state in the trap frame. 
4 


Prepare arguments to the kernel exception handler using the standard argument regis- 
ters where possible. 


5. Set the processor state for executing the kernel (establish the stack pointer so it points to 
the kernel stack, establish the global pointer to point to the kernel global area). 


6. Restart execution at the address of the kernel exception handler registered for the class 
of exception that was encountered. 


4.1.2 Exception Classes 


The PALcode classifies each exception into one of the following categories: 
¢ Memory management exceptions 
Memory management exceptions, described in Section 4.1.5 , are raised for: 


— Translation not valid faults: accesses to addresses that do not have a valid transla- 
tion for the currently executing context 
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— Access violations: accesses to addresses for which the currently executing context 
does not have permission for the access 


e System service call exceptions 


Although not really exceptions, system service calls are handled as exceptions to 
allow unprivileged code to request and receive privileged services. System services 
may be requested from both unprivileged and privileged modes (user and kernel mode 
respectively). System service calls are described in Section 4.1.6. 


© General exceptions 


The general exception class, described in Section 4.1.7, is the catchall category for all 
of the other exceptions that may be raised by unprivileged code: 


— Arithmetic exceptions | 
— Unaligned memory access 
— Illegal instruction execution 
— Invalid (non-canonical virtual) address exceptions 
— Software exceptions 
— Breakpoints 
— Subsetted instruction execution 
e Panic exceptions 


The panic exception class, described in Section 4.1.8, is reserved for. conditions from 
which execution cannot reliably be continued. The following general cases of panic 
exceptions are anticipated: 


— Invalid kernel stack (including overflow and underflow) 


— Unexpected exceptions from PALcode 


4.1.3 Returning from Exceptions 


or returning from excentions 


The rfe a f urning from exce ptions. 


= as 


d retsys instructions are provided 


The rfe (return from exception or interrupt) instruction allows the operating system to return 
from an exception. Rfe may also be used to transition from kernel mode to user-mode startup 
code. 


The rfe instruction reverses the effect of an exception by restoring the original processor state 
from the trap frame on the kernel stack. In addition, rfe accepts a parameter that allows it to 
set software interrupt requests for the execution context that is about to be reestablished 


Two exception classes do not use rfe to return to the previously executing context: system ser- 
vice caii and panic exceptions. The retsys instruction is used for returning from system service 
call exceptions because a system service call has different semantics with regard to the saved 
processor state than the other exceptions. 
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Panic exceptions do not return because they precipitate a controlled crash of the operating 
system. 


4.1.4 Trap Frames 


Trap frames are allocated on the kernel stack for all classes of exceptions in PALcode. The 
PALcode also partially writes the trap frame; the fields written are based upon the exception 
being handled. The kernel stack must be guaranteed to remain aligned on a 16-byte boundary, 
as specified in the Windows NT Alpha calling standard. The trap frame itself is guaranteed in 
size to be a multiple of 32 bytes. The PALcode may over-align the kernel stack pointer when 
allocating the trap frame in order to improve memory throughput, with consideration for the 
extra memory being consumed. The trap frame is structured so that writes aggregate. The reg- 
ister values stored in the trap frame are 64-bit values. This is required as the register set is 64 
bits and may contain 64-bit values (as opposed to canonical longwords). 


Trap frame definitions are shown in Table 4-1. 


Table 4-1: Trap Frame Definitions 


Symbolic Name _ Size Description 

TrIntSp Quadword Stack pointer register at point of exception 
TrPsr Longword Processor status register at point of exception 
TrFir Quadword Exception program counter 

TrintAO Quadword Register a0 at point of exception 

TrintA1 Quadword Register al at point of exception 

TrIntA2 Quadword Register a2 at point of exception 

TrIntA3 Quadword. Register a3 at point of exception 

TrintFp Quadword Frame pointer register at point of exception 
TrIntGp Quadword Global pointer register at point of exception 
TrintRa Quadword Return address register at point of exception 


4.1.5 Memory Management Exceptions 


PALcode recognizes two classes of memory management exceptions: translation not valid 
faults and access violations. Translation not valid faults are detected when a page table entry 
for a virtual address has the valid bit cleared. The invalid page table entry can be either a first- 
or second-level table entry. Access violations are detected by the hardware when the processor 
attempts to access a virtual address and that type of access is not permitted according to the 
protection mask in the page table entry that maps the translation for the virtual address. 


The PALcode dispatches to the kernel in the same manner for each of these two classes of 
exceptions, according to the following description: 
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All other general-purpose registers must be preserved across the memory management excep- 


previousPSR < PSR 

if ( PSR<Mode> EQ User ) then 
PSR<Mode> <~ kernel 
tp < (IKSP - TrapFrameLength)! Establish trap pointer 

else 
tp «(sp - TrapFrameLength) ! Establish trap pointer 

endif 

TrintSp(tp) < sp 

TrIntFp(tp) <« fp 

TriIntRa(tp) ¢« ra 

TrIntGp(tp) < gp 

TrIntA0(tp) < a0 

TrIntAl (tp) «al 

TrIntA2(tp) < a2 

TrintA3(tp) <« a3 

TrFir(tp) « ExceptionPC . 

TrPsr(tp) <«< previousPSR 

sp «tp 

RESTART ADDRESS < MEM MGMT ENTRY 

fp ¢ sp 

gp <- KGP 

a0 ¢ 1 if store; 0 if load 

al «+ faulting virtual address 

a2 < previousPSR<Mode> 

a3 < previousPSR 


tion dispatch. 


If the kernel can resolve the fault, it uses the rfe instruction to restart the faulting thread, thus 
reissuing the instruction that faulted. Otherwise, the kernel raises the appropriate exception. 


4.1.6 System Service Calls 


System service calls are initiated from both user and kernel modes via the callsys instruction. 
The privileged retsys instruction returns from a system service back to the caller. The callsys 


and retsys instructions are described in Sections 5.2.3 and 5.1.21, respectively. 


4.1.7 General Exceptions 


General exceptions are those exceptions, other than memory management exceptions and sys- 
tem service call exceptions, that can be raised by hardware or software. All general exceptions 
are handled in approximately the same manner in the PALcode and in exactly the same man- 


ner in the lowest level kernel exception dispatch. 


Arithmetic exceptions 
Unaligned access exceptions 


Illegal instruction exceptions 
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¢ Invalid (non-canonical virtual) address exceptions 
e Software exceptions 

e ~=6Breakpoints 

e Subsetted IEEE instruction exceptions 


A general exception builds a trap frame on the kernel stack and populates the exception record 
within the trap frame and then dispatches to the kernel general exception entry point. The com- 
mon dispatch for general exceptions is shown in Section 4.1.7.8. 


The differences between each type of exception are the population of the exception record and 
the meaning of the faulting instruction field within the trap frame. The values for each specific 
exception are detailed in the sections that follow. 


4.1.7.1 Arithmetic Exceptions 


An arithmetic trap occurs at the completion of the operation that caused the exception. Since 
several instructions may be in various stages of execution at any point in time, it is possible 
for multiple arithmetic traps to occur simultaneously. The intervening instructions (after the 
trigger instruction) are collectively called the trap shadow. See Common Architecture, Chap- 
ter 4, Arithmetic Trap Completion, for information. 


The ExceptionPC is written to the TrFir offset of the trap frame. The Exception PC written 
into the trap frame is the virtual address of the first instruction after the trapping instruction 
that has not yet executed. 


Arithmetic traps write the following information into the exception record of the trap frame, 
where er is the exception record pointer: 


ErExceptionCode(er) << STATUS ALPHA ARITHMETIC 
ErExceptioninformation<0>(er) «- FLOATING REGISTER_MASK 
ErExceptionInformation<1>(er) « INTEGER_REGISTER_MASK 
ErExceptionInformation<2>(er) «- EXCEPTION SUMMARY 
ErNumberParameters(er)< 3 

ErExceptionFlags(er) < 0 

ErExceptionRecord(er) < 0 


The floating register masks indicate which floating-point registers were destinations of instruc- 
tions that caused an exception. A one in the corresponding position for a register indicates that 
the register was the destination of an instruction that faulted. A zero indicates that the register 
was not the destination of an instruction that faulted. The definition of the correspondence 
between the floating registers and the bits in the mask is shown in Figure 4-1. 


Figure 4-1: Floating-Point Register Mask (FLOAT_REGISTER_MASK) 


31 3029 210 


FIF 
3/3 F29 through F2 ale 
1/0 1 
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The integer register masks indicate which integer registers were destinations of instructions 
that caused an exception. A one in the corresponding position for a register indicates that the 
register was the destination of an instruction that faulted. A zero indicates that the register was 
not the destination of an instruction that faulted. The definition of the correspondence between 
the integer registers and the bits in the mask is shown in Figure 4—2. 


Figure 4—2: Integer Register Mask (INTEGER_REGISTER_MASK) 


31 30 29 210 


R|R 
3/3 R29 through R2 
1/0 


The format of the exception summary register is shown in Figure 4—3 and the fields are 
defined in Table 4—2. . 


Figure 4-3: Exception Summary Register (EXCEPTION_SUMMARY) 


31 76543210 
1} JUJO;D] 1S 

RAZ JOIN|N]V|ZIN 
VIE|FIFIE;VIC 


Table 4-2: Exception Summary Register Fields 


Field Name Description 

RAZ Read as zero. 

IOV Integer overflow Result of integer operation overflowed the destination’s 
precision. 

INE Inexact result Result of floating operation caused loss of precision. 

UNF Underflow Result of floating operation underflowed the destination 
exponent. 

OVF Overflow Result of floating operation overflowed the destination 
exponent. 

DZE Division by zero Floating-point divide attempt with a divisor of zero. 

INV Invalid operation One or more of the operands of a floating-point operation 


was an illegal value. 


SWC Software completion The exception completion qualifier /S was selected for all 
| of the faulting instructions. 
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4.1.7.2 Unaligned Access Exceptions 


Unaligned access exceptions are reported to and handled by the kernel and are precise. There- 
fore, the address written to the faulting instruction offset of the trap frame is the virtual 
address of the load or store instruction that accessed the unaligned address. 


The PALcode writes the following information into the exception record of the trap frame for 
an unaligned access exception, where er is the exception record pointer. 


ErExceptionCode(er) «- STATUS_DATATYPE MISALIGNMENT 
ErExceptionInformation<0>(er) < Faulting opcode 
ErExceptioninformation<1>(er) «- Destination register 
ErExceptionInformation<2>(er) < Unaligned virtual address 
ErNumberParameters(er) < 3 

ErExceptionFlags (er) < 0 

ErExceptionRecord(er) < 0 


4.1.7.3 Illegal Instruction Exceptions 


PALcode raises the following types of illegal operations as illegal instruction exceptions: 
e Attempt to execute an instruction with an opcode reserved to DIGITAL. 
e Attempt to execute an instruction with an unimplemented PALcode function code. 
e Attempt to execute a privileged PALcode instruction from user (unprivileged) mode. 
e Attempt to execute an instruction with an illegal operand. 


e Attempt to execute an unimplemented/subsetted instruction. 


Note: 


Instructions with illegal operands cause illegal instruction exceptions to be raised only if 
the processor raises an exception for these operations. 


Illegal instruction exceptions are precise; the faulting address written into the trap frame is the 
virtual address of the instruction that caused the exception. 


The PALcode writes the following information into the exception record of the trap frame for 
an illegal instruction exception, where er is the exception record pointer. 


ErExceptionCode(er) <« STATUS ILLEGAL INSTRUCTION 
ErNumberParameters(er) < 0 
ErExceptionFlags (er) < 0 


4.1.7.4 Invalid (Non-Canonical Virtual) Address Exceptions 


The PALcode raises a general exception if the PALcode detects an invalid faulting virtual 
address, that is, a faulting virtual address that is not a canonical longword. The implementation 
must test for the non-canonical format for both data stream and instruction stream translation 
buffer fills. 


For data stream faults, the faulting address written to the trap frame is the virtual address of 
the instruction that caused the reference to the invalid address. 
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Instruction stream invalid addresses present a more difficult problem because the exception 
address itself is invalid and cannot be properly interpreted by a 32-bit operating system. In the 
case of instruction stream virtual addresses, the ra (return address) register minus 4 (ra—4) is 
written to the faulting address field of the trap frame. The ra register is used because it proba- 
bly yields a sane address within the correct program that faulted. Also, the (ra—4) is the most 
probable faulting address, as the most likely instruction to have caused the fault is: jsr ra, (rx). 


The PALcode writes the following information into the exception record of the trap frame for 
a non-canonical virtual address fault, where er is the exception record pointer. 


ErExceptionCode(er) <«<- STATUS INVALID ADDRESS 
ErExceptionInformation<0>(er) «< 1 if store; 0 otherwise 
ErExceptioniInformation<1>(er) « invalid va<63..32> 
ErExceptionInformation<2>(er) «< invalid va<31..0> 
ErNumberParameters(er) < 3 

ErExceptionFlags (er) < 0 

ErExceptionRecord(er) < 0 


4.1.7.5 Software Exceptions 


Software may raise exceptions by using the unprivileged gentrap (generate trap) instruction. 
The gentrap instruction is used to raise exceptions recognized (possibly) in user-mode soft- 
ware for conditions such as divide by zero. (The Alpha architecture does not provide an 
integer divide instruction; division is accomplished by specialized divide routines. 


The gentrap instruction takes a single parameter that is preserved but not interpreted by the 
PALcode. The gentrap parameter is written into the exception record where it is interpreted by 
the kernel exception handler. Gentrap uses the STATUS_ALPHA_GENTRAP) status as an 
exception code. The kernel exception dispatcher interprets the gentrap parameter to determine 
the appropriate Windows NT Alpha status to raise to the currently executing thread. 


The faulting address for a gentrap exception is the virtual address of the executed gentrap 
instruction. 


The PALcode writes the following information into the exception record for a gentrap instruc- 
tion, where er is the exception record pointer: 


ErExceptionCode(er) « STATUS ALPHA GENTRAP 
ErExceptioniInformation<0>(er) « gentrap parameter 

(a0<31..0> upon execution of gentrap) 
ErExceptionInformation<1>(er) «< gentrap parameter 

(a0<63..32> upon execution of gentrap) 
ErNumberParameters(er) < 2 
ErExceptionF lags (er) < 0 
ErExceptionRecord(er) <0 


ee 


There are several breakpoint instructions and each raises a general exception. Several of these 
breakpoints. are implemented to support the kernel debugger and are essentially special subrou- 
tine calls. The exact semantics of these calls are not important to the PALcode; all breakpoints 
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are handled in the same manner and are distinguished only by the breakpoint type that is writ- 
ten into the exception record. 


All breakpoints are implemented as unprivileged PALcode instructions, which allows the ker- 
nel to decide whether the breakpoint can be taken in the current mode. 


Table 4—3 lists the breakpoint mnemonics and their corresponding breakpoint types: 


Table 4-3: Breakpoint Types 


Mnemonic Type Description 

bpt USER_BREAKPOINT User breakpoint 
kbpt KERNEL_BREAKPOINT _ Kernel breakpoint 
callkd Passed in vO Call kernel debugger 


The faulting instruction address for all breakpoints is the virtual address of the breakpoint 
instruction. 


PALcode completes the exception record for breakpoints as follows, where er is the exception 
record pointer: 


ErExceptionCode(er) <«< STATUS BREAKPOINT 
ErExceptioniInformation<0>(er) < breakpoint type 
ErNumberParameters(er) <1 

ErExceptionFlags(er) < 0 
ErExceptionRecord(er) < 0 


4.1.7.7 Subsetted IEEE Instruction Exceptions 


Floating-point instructions are always enabled. Therefore, FEN (floating enable) faults are not 
supported. 


Hardware Implementation Note: 


Windows NT Alpha requires implementation of IEEE floating-point in each processor 
implementation. The PALcode raises an illegal instruction exception for any subsetted 
IEEE floating-point instruction — that is, for any IEEE floating-point instruction not 
implemented in hardware. 


VAX floating-point format is not supported. 


DIGITAL Restricted Distribution 


Exceptions, Interrupts, and Machine Checks (II-C) 4-9 


4.1.7.8 General Exceptions: Common Operations 


The common operations for all general exceptions are as follows. 


previousPSR <-PSR 
if ( PSR<Mode> EQ User ) then 
PSR<Mode> <«< kernel 
tp < (IKSP - TrapFrameLength)! Establish trap pointer 
else 
tp < (sp - TrapFrameLength) ! Establish trap pointer 
endif 
TrintSp(tp) < sp 
TrintFp(tp) < fp 
TrIntGp(tp) <« gp 
TrIntRa(tp) <« ra 
TrIntA0(tp) < a0 
TrIntAl(tp) <« al 
TrIntA2(tp) < a2 
TrIntA3(tp) « a3 
TrPsr(tp) < previousPSR 
TrFir(tp) « ExceptionPC 
sp «tp 
RESTART ADDRESS <- GENERAL ENTRY 
fp < sp 
gp < KGP 
a0 « tp + TrExceptionRecord ! pointer to exception record 
a3 < previousPSR 


All other general-purpose registers must be preserved across the general exception dispatch. 


4.1.8 Panic Exceptions 


- Severe problems produce panic exceptions. Severe problems are not recoverable; the operat- 
ing system cannot continue executing normally. Panic exception handling shuts down the 
machine in a controlled manner that assists in debugging the problem. With the exception of 
hardware errors, panic exceptions are not expected to occur in the production operating system. 


The PALcode raises a panic exception to the kernel and describes the condition that causes the 
panic with a bugcheck code. When the kernel receives a panic exception, it enters the kernel 
debugger if it is enabled. 
The classes of panic exceptions are: 

¢ Kernel stack corruption 


¢ Unexpected exceptions in PALcode 
4.1.8.1 Kernel Stack Corruption 


The PALcode can recognize the following types of kernel stack corruption: invalid kernel 
stack, kernel stack overflow, and kernel stack underflow. The kernel stack for an executing 
thread must always be valid. The PALcode raises a panic exception if the processor faults 


| DIGITAL Restricted Distribution 
4-10 Windows NT Alpha Software (II-C) 


when accessing the kernel. stack and the page tables indicate that the kernel stack address is 
not valid. The PALcode may also check for kernel stack underflow and overflow and raise a 
panic exception if either condition is detected. 


The kernel stack is the two pages of virtual address space below the IKSP for a thread, where 
the IKSP points to the byte beyond the top of the two pages. When raising a kernel stack cor- 
ruption exception, the PALcode sets the bugcheck code to PANIC_STACK_SWITCH. 


4.1.8.2 Unexpected Exceptions 


The PALcode may raise a panic exception when it detects an unexpected condition caused by 
PALcode. Such unexpected conditions are implementation dependent. It is anticipated that 
those conditions indicate a bug in the PALcode or that the processor is no longer executing 
correctly. The PALcode raises the bugcheck code TRAP_CAUSE_UNKNOWN. 


4.1.8.3 Panic Exception Trap Frame and Dispatch 


The PALcode builds a trap frame for the kernel before it dispatches. The PALcode also fills in 
the exception record that exists within the trap frame. 
The PALcode attempts to maintain all possible register state in order to assist in debugging. 


The PALcode performs the following operations when dispatching a panic exception to the 
kernel: 


previousPSR < PSR 

if ( PSR<Mode> EQ User ) then 

| PSR<Mode> < Kernel 

endif 

panicStack < PcPanicStack(PCR) -! Get the panic stack 

tp < (panicStack - TrapFrameLength) ! Allocate trap frame 
! on panic stack 

TrIntSp(tp) < sp 

TriIntFp(tp) < fp 

TrIntGp(tp) < gp 

TrIntRa(tp) ¢ ra 

TrIntA0(tp) < a0 

TrIntAl(tp) < al 

TrintA2(tp) < a2 

TrIntA3(tp) < a3 

TrPsr(tp) < previousPSR 

TrFir(tp) << ExceptionPC 

sp <¢tp 

fp «sp 

gp <- KGP 

aOQ ¢ NT bugcheck code 

al ¢ Exception address 

a2, a3, a4 < Bugcheck parameters 

RestartAddress << PANIC _ENTRY 


All other general-purpose registers must be preserved across the panic exception dispatch. 
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4.2 Interrupts 


The PALcode supports two software interrupt levels and an implementation-specific limit of 
hardware interrupt sources. The Windows NT Alpha PALcode supports eight levels of inter- 
rupt priority known as interrupt request levels (IRQL). The supported IRQLs are numbered 
0-7. 


The platform independence of interrupt dispatch is accomplished via three tables: Interrupt 
Level Table, Interrupt Mask Table, and Interrupt Dispatch Table. 


4.2.1 Interrupt Level Table (ILT) 


The Interrupt Level Table consists of eight entries, indexed 0-7. The index values and sym- 
bols for the entries are described in Table 2—1. Each table entry corresponds to an IRQL by its 
index within the table. The value of each entry is an enable value that indicates which interrupt 
sources are to be enabled within the processor for the corresponding IRQL. One full longword 
is reserved for each table entry. The interpretation of the bits within the enable mask is proces- 
sor specific. 


Implementation Note (Software): 


The Interrupt Level Table is probably the most important optional set of data that can be 
cached within the processor. Implementations should consider implementing a PALcode 
instruction that causes the ILT to be reread and recached within the processor. Some 
processors may have an effectively hardwired ILT. In such a case, the HAL has no 
influence over which interrupts are enabled for each IRQL. 


4.2.2 Interrupt Mask Table (IMT) 


The Interrupt Mask Table relates a mask value of requested interrupts to both an interrupt vec- 
tor and a synchronization IRQL. The table resolves implicit interrupt priorities because only 
one interrupt vector can be assigned for each request mask. The IMT is divided into two 
sub-tables as described in Table 4—4. 


Index Range Interrupt Source Description 


0-3 Software (2 sources) 


4-131 Hardware 


Each entry in the table is a longword that consists of two word values: the interrupt vector 
number and the synchronization level. The use of the software portion of the table is strictly 
defined and consistent across all processor implementations. 


Implementation Note: 


In an implementation, the relation between pending interrupts and their interrupt vectors 
and synchronization levels may be hardwired. In that case, the IMT is not used and the 
HAL is not able to influence the setting of priority or assignment of interrupts. 
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The software entries are used only if no hardware interrupts are pending. The entries must be 
initialized so that deferred procedure call (DPC) software interrupts are higher priority than 
asynchronous procedure call (APC) software interrupts. The expected initialization of the soft- 
ware portion of the IMT is defined in Table 4—5. 


Table 4—5: Software Entries of the IMT 


Index Synchronization Level Vector 

0 PASSIVE_LEVEL = 0 Passive release vector 
1 APC_LEVEL = 1 APC dispatch vector 
2 DISPATCH_LEVEL = 2 DPC dispatch vector 
3 DISPATCH_LEVEL = 2 DPC dispatch vector 


The hardware portion of the IMT is designed for flexible use. Each implementation must 
define a relation f that defines a mapping of requested and enabled hardware interrupt sources 
to entries in the IMT. The relation /f is implementation specific, but f must be a function in 
the mathematical sense (for each input there is a single unambiguous result). All interrupts 
other than software interrupts are considered hardware interrupts. Hardware interrupts can 
include external interrupt signals, performance counter interrupts, and correctable read 
interrupts. 


4.2.3 Interrupt Dispatch Table IDT) 


The Interrupt Dispatch Table (IDT) has an entry for each possible interrupt vector. The possi- 
ble interrupt vectors are in the range 0-255. Each entry is a longword pointer, which is the 
virtual address of the interrupt dispatch routine for the vector that corresponds to the index of 
the entry within the table. The PALcode does not read or write the IDT; it is maintained and 
used entirely by the kernel and HAL. 


4.2.4 Interrupt Dispatch 


Interrupt dispatch within the PALcode goes through the following steps: 


! Mask of requested (irr) and enabled (ier) interrupt sources: 
irm ¢ irr AND ier 
! Retrieve value from interrupt mask table: 
CASE 
Hardware Interrupt Pending : 
index = £(irm) 
Sirgl < (IMT<{index*4}>)<Synchronization IRQL> 
vector < (IMT<{index*4}>)<InterruptVector> 
Software Interrupt Pending: 
Sirgl ¢ (IMT<{irm*4}>)<Synchronization IRQL> 
vector < (IMT<{irm*4}>)<InterruptVector> 
Otherwise: 
Passive release, restart execution 
ENDCASE 
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Set processor to sirgl IRQL 

if ( processor interrupt ) then 
{ acknowledge the interrupt } 
endif 


Once synchronization level has been set and the interrupt service routine has been determined, 
the PALcode builds a trap frame and dispatches to the kernel interrupt exception handler pass- 
ing in the interrupt vector. | 


In the case of software interrupts: 


previousPsr <¢-PSR 
if ( PSR<Mode> EQ User ) then 
PSR<Mode> <¢ Kernel 
tp <-(IKSP - TrapFrameLength) ! Establish trap pointer 
else 
tp <« (sp - TrapFrameLength) ! Establish trap pointer 
endif © 
TrintSp(tp) < sp 
TrintFp(tp) < fp 
TriIntGp(tp) < gp 
TrIntA0(tp) < a0 
TrintAl(tp) <« al 
TrIntA2(tp) < a2 
TrintA3(tp) < a3 
TrFir(tp) « ExceptionPC 
TrPsr(tp) « previousPSR 
TrintRa(tp) «ra 
sp <-tp 
fp <- sp 
gp < KGP 
aOQ « interrupt vector 
al — PCR 
a2 < synchronization IRQL 
a3 < previousPSR 
RestartAddress < INTERRUPT _ENTRY 


In the case of hardware interrupts: 


PreviousPSR <«- PSR 
if ( PSR<Mode> EQ User ) then 
PSR<Mode> <- Kernel 
tp «(IKSP - TrapFrameLength) ! Establish trap pointer 
else : 
tp <( (sp - TrapFrameLength) ! Establish trap pointer 
TrintSp(tp) < sp 
TrintFp(tp) <« fp 
TrIntGp(tp) <« gp 
TrintAd(tp) < a0 
TrIntAl(tp) «al 
TrIntA2(tp) < a2 
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TrIntA3(tp) < a3 
TrFir(tp) « ExceptionPC 
TrPsr(tp) < previousPSR 
TrIntRa(tp) <« ra 

sp < tp 

fp ¢ sp 

gp < KGP 

a0 < interrupt vector 

al «PCR 

a2 < synchronization IRQL 
a3 < previousPSR 
RestartAddress <- INTERRUPT_ENTRY 


All other general-purpose register values must be preserved across interrupt dispatch. 
The kernel uses the rfe instruction to restart the interrupted code sequence. 


4.2.5 Interrupt Acknowledge 


Interrupts are acknowledged according to their origin. Internal processor interrupts, such as 
software interrupts and performance counters, are acknowledged by the PALcode. Sys- 
tem-level interrupts are acknowledged in the native interrupt dispatch routines. 


4.2.6 Synchronization Functions 
The swpirql, di, and ei instructions allow the kernel to affect the processor’s current interrupt 
enable state: 


¢ Swpirql swaps the current interrupt request level (IRQL) of the processor. Swpirq! takes 
the new IRQL as a parameter and returns the previous IRQL. 


e Di disables all interrupts without changing the current IRQL. 
e Ei enables interrupts at the currently set IRQL. 


Those instructions and the existence of the interrupt enable bit in the PSR are used as a global 
interrupt enable for all interrupts. 


4.2.7 Software Interrupt Requests 


The PALcode includes the software interrupt request register (SIRR), an architected internal 
processor register, for controlling software interrupt requests. The PALcode also includes two 
instructions, ssir and csir, to control the state of the SIRR register. 


The format of the SIRR is shown in Figure 4—4 and the fields are defined in Table 4—6. 


Figure 4—4: Software Interrupt Request Register 
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Table 4—6: Software Interrupt Request Register Fields 


Field 


DPC 
APC 


Type Description 


RW DPC software interrupt requested 
RW APC software interrupt requested 


The ssir and csir instructions affect the state of software interrupt requests. 


The ssir instruction sets software interrupt requests by taking as a parameter the interrupt 
request levels to be set. Setting the appropriate bit in SIRR indicates that the corresponding 
software interrupt is requested. The csir instruction clears software interrupt requests by taking 
as a parameter the interrupt request level to be cleared. Clearing the appropriate bit in SIRR 
indicates that the corresponding software interrupt request has been cleared. 


4.3 Machine Checks 


Machine checks are initiated when the hardware detects a hardware error condition. However, 
machine checks are not the only way that detected hardware errors are reported. Hardware 
error conditions can be reported from three sources: 


At the pin level. Hardware may choose to signal errors via hardware interrupts. PAL- 
code delivers such hardware error interrupts to the kernel as standard interrupts, where 
they may be hooked by the HAL for system-specific processing. Such interrupts are not 


processed by the PALcode as machine checks and are not described in this section. 


From an implementation-dependent internal error interrupt. It is an implementation 
decision whether to deliver such an interrupt as a standard interrupt or as a machine 
check. The processing of an interrupt that is delivered as a machine check is described 
in this section. 


At the machine check hardware vector. Hardware errors that are signaled by the proces- 
sor through a specific machine check hardware vector are considered machine checks 
and are described in this section. 


The machine check condition may be correctable or uncorrectable. If uncorrectable, the hard- 


ware may choose to retry the operation that returned the error. 


The PALcode recognizes the following types of machine checks: 


‘orrectable 


~— = 


Correctable errors 
Uncorrectable errors 


Catastrophic errors 


rrors 


Processor correctable errors are data errors that are detected by the processor and can be reli- 
ably corrected. System correctable errors are detected and corrected by the system hardware; 
incorrect data is not read into the processor. 
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Correctable errors are maskable by the MCES internal processor register (Figure 4—5). It is rec- 
ommended that correctable errors be disabled during PALcode initialization and subsequently 
be explicitly enabled by the HAL. Correctable errors are delivered from the PALcode to allow 
the HAL to log the errors. The PALcode builds a logout frame with per-processor information 
that assists the HAL in logging the error. 


4.3.2 Uncorrectable Errors 


Uncorrectable errors from the processor are detected by the processor and exhibit data errors 
that cannot be reliably corrected. Actual processor uncorrectable errors are defined by the pro- 
cessor implementation. Uncorrectable errors from the system are detected but not corrected by 
the system hardware. 


Although uncorrectable errors are likely also to be unrecoverable, a mechanism exists in the 
exception record to allow one or more retries when appropriate. The HAL controls the retry 
count. For example, a parity error in the I-cache, although uncorrectable, may disappear after 
an operation retry. 


The machine check exception is raised to the HAL to allow per-platform error handling. 
Uncorrectable errors are delivered immediately upon detection. The PALcode creates a logout 
frame with per-processor information to assist the HAL in handling the error condition. 


4.3.3 Machine Check Error Handling 


The general model for machine check handling has the following flow: 
1. The PALcode corrects the error, if possible. 
The PALcode sets the machine to a known state from which restart is possible. 
The PALcode builds a logout frame describing the detected error. 


2 

3 

4. The PALcode sets processor IRQL appropriately (see below). 
5. The PALcode dispatches a general exception to the kernel. 

6 


In the case of a catastrophic error, PALcode returns control to the firmware, as 
described in Section 4.3.4. 


The machine check error summary (MCES) register, Figure 4—5, indicates and controls the 
current state of the machine check handler for the processor. Table 4—7 describes the MCES 
register. 


Figure 4—5: Machine Check Error Summary 
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Table 4-7: Machine Check Error Summary Fields 


Field 


DMK 
DSC 
DPC 
PCE 
SCE 
MCK 


Type 
RW 
RW 
RW 
RW 
RW 
RW 


Description 


Disable all machine checks 

Disable system correctable error reporting 
Disable processor correctable error reporting 
Processor correctable error reported 

System correctable error reported 


Machine check (uncorrectable) reported (see Section 4.3.4) 


All machine checks (correctable and uncorrectable) are maskable via the DMK bit in the 
MCES register. This bit is provided only for debugging systems. 


The initial value in MCES is implementation specific but, wherever possible, PALcode 
attempts to preserve the state of machine check enables from the previous PALcode environ- 
ment during initialization. 


PALcode writes the exception record with the following values for a machine check, where er 
is the exception record pointer. 


ErExceptionCode(er) <— DATA BUS ERROR 
ErExceptioniInformation<0>(er) «< machine check type 
ErExceptionInformation<1l>(er) < pointer to logout frame 
ErNumberParameters(er) < 2 

ErExceptionFlags (er) < 0 

ErExceptionRecord(er) < 0 


The two-bit mask that shows the machine check type is shown in Table 4-8. 


Table 4-8: Machine Check Types 


Machine Check Type Mask Value (Bits 0:1) 
Uncorrectable with no retries 00 
Correctable 01 
Uncorrectable with retries 10 
Reserved 11 


The virtual address of the logout frame is a 32-bit superpage address, and the logout frame has 


a per-processor format. 
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The draina instruction, when coupled with appropriate implementation-specific native code, 
can allow software to force completion of all previously executed instructions, such that the 
previous instructions cannot cause machine checks to be signaled while any instructions subse- 
quent to the draina are executed. 


4.3.4 Catastrophic Errors 


Although particular catastrophic conditions are specific to the processor implementation, such 
conditions indicate that the machine is left in a state where execution cannot be reliably 
restarted. They also indicate that the hardware cannot be trusted to execute properly or the 
state of data within the system cannot be determined. 


An example of a catastrophic condition is a machine check taken while machine check han- 
dling is in progress, as indicated by a set MCK bit in the MCES register. Taking a machine 
check while in the PALcode environment is also considered catastrophic. In those cases, con- 
trol is returned to the firmware as follows: 


1. Further machine check acknowledgement is turned off and a logout frame is generated. 
2. The restart block is verified: 


— If the restart block is good, the current state in the restart block is saved, the previ- 
ous state is restored, and control is returned to the firmware at the restart address. 


— If the restart block is bad, the alternate path is used to re-execute the previous PAL- 
code image at its entry address. See Section 6.2.1. 
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4.4 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 
2. Windows NT AXP —> Windows NT Alpha 
3. Digital —> DIGITAL 
4. Updated for NT 4.0 
5 


Removed interrupt stack references 


Revision 6.0 , December 12, 1994 
1. Created from Windows NT Alpha specifications\ 


DIGITAL Restricted Distribution 
4~20 Windows NT Alpha Software (II-C) | 


Chapter 5 


PALcode Instruction Descriptions (II-C) 


The PALcode instructions generally follow the Windows NT Alpha calling standard. Argu- 
ments are passed in the argument (a0—a5) registers and return values are returned in the value 
(vO) register. The PALcode instructions also incorporate the following conventions into their 
own calling standard: 


e Unless specific temporary registers are required, only the argument registers a0—a5 are 
considered volatile. 


¢ Generally, all parameters are passed in registers. 


The argument registers are used as volatile registers because often they contain parameters to 
the PALcode instructions. In strict adherence to the calling standard, the temporary registers 
t0-t12 could also be considered volatile in the PALcode instructions, but they are not. The 
temporary registers are not considered necessarily volatile because PALcode instructions gen- 
erally do not need more free registers. Further, it is convenient in assembly language, from 
which the PALcode instructions are most frequently called, to be able to assume that tempo- 
rary registers are preserved across the PALcode instruction. 


All parameters to the PALcode instructions are passed in registers. If the number of parame- 
ters exceeds the available number of argument registers, additional temporary registers are 


used as arguments. This precludes the need for callers to build an appropriate stack frame for 
PAT eade inctructianns with more than cix narameters 


The RESTART_ADDRESS register indicates the next execution address when the PALcode 
exits. Upon entry to each of the PALcode instructions, the RESTART_ADDRESS register is 
considered to contain the address of the instruction immediately following the PALcode 
instructions. 


A range of privileged PALcode instructions is reserved for processor-implementation-specific 
PALcode instructions that allow specialized communication between the HAL and the 
PALcode. 


Note: 


The Operation part of the PALcode instruction descriptions is shown as an ordered 
sequence of instructions. The instructions in the sequence may be reordered as long as the 
results of the sequence of instructions are not altered. In particular, if an instruction 7 is 
listed subsequent to an instruction i and i writes any data that is used by j, then i must be 
executed before /. 
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5.1 Privileged PALcode Instructions 
Table 5—1 summarizes the privileged PALcode instructions. 


Table 5-1: Privileged PALcode Instruction Summary 


Mnemonic Description 

csir Clear software interrupt request 
dalnfix Disable alignment fixups 

di Disable interrupts 

draina Drain aborts 

dtbis Data translation buffer invalidate single 
ealnfix Enable alignment fixups 

ei Enable interrupts 

halt Halt the processor 

initpal Initialize the PALcode 

initpcr Initialize PCR data 

rdcounters Read PALcode event counters 

rdirql Read current IRQL 

rdksp Read initial kernel stack 

rdmces Read machine check error summary 
rdpcer Read processor control region address 
rdpsr Read processor status register 

rdstate Read internal processor state 

rdthread Read the current thread value 

reboot Transfer to console or previous PALcode environment 
restart Restart the processor 

retsys Return from system service call 

rfe Return from exception 

ssir Set software interrupt request 

swpctx Swap privileged thread context 
swpirql Swap IRQL 

swpksp Swap initial kernel stack 

swppal Swap PALcode 

swpprocess Swap privileged process context 
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Table 5—1: Privileged PALcode Instruction Summary (Continued) 


Mnemonic Description 

tbia Translation buffer invalidate all 

tbim Translation buffer invalidate multiple 

tbimasn Translation buffer invalidate multiple for ASN 
tbis Translation buffer invalidate single 

tbisasn Translation buffer invalidate for single ASN 
wrentry Write system entry 

wrmces _ Write machine check error summary 
wrperfmon Write performance monitoring values 
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5.1.1 Clear Software Interrupt Request 
Format: 


csir ! PALcode format 
Operation: 


{ a0 = Software interrupt requests to clear } 


if ( PSR<Mode> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
if ( a0<l> EQ 1 ) then 
SIRR<DPC> < 0 
endif 
if ( a0<0> EQ 1 ) then 
SIRR<APC> < 0 
endif 


GPR State Change: 
aQ—a5 are UNPREDICTABLE 


IPR State Change: 
SIRR < O according to a0 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The csir instruction clears the specified bit in the SIRR internal processor register, depending 
on the contents of a0. See Section 4.2.7. 
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5.1.2 Disable Alignment Fixups 


Format: 
dalnfix ! PALcode format 


Operation: 


if ( PSR<Mode> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Implementation-specific state is set to generate alignment fault } 
{ exceptions and to prevent alignment fault fixups by the PALcode } 


GPR State Change: 


None 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The dalnfix instruction disables alignment fault fixups in PALcode and generates alignment 
fault exceptions whenever an alignment fault occurs. After dalnfix is executed on a processor, 


all alianment faulte an that nracecenr are nat fixed mn hv PAT cade and alionment fault excen- 


tions are dispatched to the kernel until the ealnfix instruction is executed on ‘that processor. 
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5.1.3 Disable All Interrupts 
Format: 


di ! PALcode format 


Operation: 


if ( PSR<Mode> EQ User ) then 

{ Initiate illegal instruction exception } 
endif 
PSR<IE> < 0 


GPR State Change: 


None 


IPR State Change: 
PSR<IE> < 0 


Exceptions: 
Ulegal Instruction 
Machine Checks 


Description: 


The di instruction disables all interrupts by clearing the interrupt enable bit (IE) in the PSR 
internal processor register. The IRQL field is unaffected. Interrupts may be re-enabled via the 
ei instruction. 
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5.1.4 Drain All Aborts Including Machine Checks 


Format: 


draina ! PALcode format 


Operation: 


if ( PSR<Mode> EQ User ) then 

{ Initiate illegal instruction exception } 
endif 
{ Implementation-specific drain } 


GPR State Change: 


None 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The draina instruction facilitates the draining of all aborts, including machine checks, from the 
current processor. When coupled with the appropriate implementation-specific native code, 
draina can help guarantee that no abort is signaled for an instruction issued before the draina 
while any instruction issued subsequent to the draina is executing. 
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5.1.5 Data Translation Buffer Invalidate Single 


Format: 
dtbis ! PALcode format 


Operation: 


{ a0 = Virtual address of translation to invalidate } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Invalidate all translations in the data stream for the } 
{ virtual address in a0 } 


GPR State Change: 


a0—a5 are UNPREDICTABLE 


IPR State Change: 


‘None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The dtbis instruction invalidates a single data stream translation. The translation for the virtual 
address in a0 must be invalidated in all data translation buffers and in all virtual data caches. 
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5.1.6 Enable Alignment Fixups 


Format: 


ealnfix | ! PALcode format 


Operation: 


if ( PSR<Mode> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Implementation-specific state is set to fix up alignment fault } 
{ by the PALcode } . 


GPR State Change: 


None 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The ealnfix instruction enables alignment fault fixups in PALcode and prevents alignment 
fault exceptions. After ealnfix is executed on a processor, all alignment faults on that proces- 
sor are fixed-up by PALcode and no alignment fault exceptions are dispatched to the kernel 
until the dalnfix instruction is executed on that processor. 


The default state is disabled alignment fixups in PALcode. 
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5.1.7 Enable Interrupts 


Format: 
ei ! PALcode format 


Operation: 


if ( PSR<MODE> EQ User ) then 

{ Initiate illegal instruction exception } 
endif 
PSR<IE> < 1 


GPR State Change: 


None 


IPR State Change: 
PSR<IE> ¢ 1 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The ei instruction sets the interrupt enable (IE) bit in the PSR internal processor register, thus 
enabling those interrupts that are at the appropriate level for the current IRQL field in the PSR. 
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5.1.8 Halt the Operating System by Trapping to Illegal Instruction 


Format: 


halt ! PALcode format 


Operation: 


{ Initiate illegal instruction exception } 


GPR State Change: 


See Section 4.1.7.3 for illegal instruction exception handling. 


IPR State Change: 


See Section 4.1.7.3 for illegal instruction exception handling. 


Exceptions: 


Illegal Instruction 


Description: 


The halt instruction forces an illegal instruction exception. See the reboot instruction, Section 
5.1.19, for transferring control to the console or previous PALcode environment. 
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5.1.9 Initialize PALcode Data Structures with Operating System Values 


Format: 

initpal : ! PALcode format 
Operation: 

{ a0 = Page directory entry (PDE) page, superpage 32 address } 

{ al = Initial thread value } 

{ a2 = Initial TEB value } 

{ gp = Kernel global pointer } 

{ sp = Initial kernel stack pointer } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 


endif 

PDR < (a0 BIC 80000000; ) 

THREAD ¢ al 

TEB <- a2 

IKSP < sp 

KGP < gp 

PcPalBaseAddress (PCR) «<- PAL _BASE 
PcPalMajorVersion (PCR) « PalMajorVersion 
PcPalMinorVersion (PCR) ¢ PalMinorVersion 
PcPalSequenceVersion (PCR) < PalSequenceVersion 
PcPalMajorSpecification(PCR) < PalMajorSpecification 
PcPalMinorSpecification(PCR) << PalMinorSpecification 


vO <¢ PAL BASE 


GPR State Change: 
v0 — PAL _BASE 
a0—a5 are UNPREDICTABLE 


IPR State Change: 
PDR < a0 
THREAD ~< al 
TEB < a2 
IKSP © sp 
KGP < gp 


Exceptions: 


Illegal Instruction 
Machine Checks 
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Description: 


The initpal instruction is called early in the kernel initialization sequence to establish IPR val- 
ues for the initial thread PDR, THREAD, TEB, and IKSP. The IPR value KGP persists for the 
life of the system. In addition, initpal writes the PALcode version information into the PCR. 


On return from the initpal instruction, the return value register, v0, contains the PAL_BASE 
register, the base address in 32-bit superpage (ksegO) format. 
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5.1.10 Initialize Processor Control Region Data 


Format: 


initpcr ! PALcode format 


Operation: 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Cache portions of Interrupt Level Table and Processor Control Region } 
{ data in implementation-dependent manner } 


GPR State Change: 
aQ—a4 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The initpcr instruction caches process-specific information, including parts of the Interrupt 
Level Table (ILT), for use by the PALcode. See Section 6.1.4 for information on the ILT. 
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5.1.11 Read the Software Event Counters 


Format: 
rdcounters ! PALcode format 
Operation: 
{ a0 = Pointer to 32-bit superpage address of counter record buffer. } 
{ Address must be quadword aligned } 
{ al = Length of buffer in bytes } 


if ( PSR<MODE> EQ User ) then 


endif 


{ Initiate illegal instruction exception } 


{ Dump event counter values to the counter record } 


v0 « status 


GPR State Change: 


vO < status 


a0-—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


For debug PALcode (see Section 5.3), rdcounters causes that PALcode to write the state of its 
internal software event counters into an implementation-specific counter record pointed to by 
the address passed in the a0 register. For production PALcode, rdcounters returns a status 
value of zero, indicating that it is not implemented in the current PALcode image. 


On return from rdcounters, vO contains the status as follows: 


If vO = 
If vO < 
If vO > 


0 
al 
al 


Interface is not implemented. 
vO is length of data returned. 


No data is returned and vO is length of processor implementation 
counter record. 
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5.1.12 Read the Current IRQL from the PSR 


Format: 


rdirql ! PALcode format 


Operation: 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
v0 «¢ PSR<IRQL> 
GPR State Change: 


v0 — <IRQL> 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The rdirql instruction returns in vO the contents of the interrupt request level (IRQL) field of 
the PSR internal processor register. 
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5.1.13 Read Initial Kernel Stack Pointer for the Current Thread 
Format: 


rdksp ! PALcode format 


Operation: 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
vO <¢ IKSP 
GPR State Change: 


vO — <IKSP> 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The rdksp instruction returns in vO the contents of the IKSP (initial kernel stack pointer) inter- 
nal processor register for the currently executing thread. 
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5.1.14 Read the Machine Check Error Summary Register 


Format: 


rdmces ! PALcode format 


Operation: 
if ( PSR<MODE> EQ User ) then 
_{ Initiate illegal instruction exception } 
endif 
v0 < MCES 
GPR State Change: 


v0 «— MCES 


IPR State Change: 


none 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The rdmces instruction returns in vO the contents of the machine check error summary 
(MCES) internal processor register. 
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5.1.15 Read the Processor Control Region Base Address 
Format: 


rdpcr ! PALcode format 


Operation: 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif | 
vO <¢ PCR 
GPR State Change: 


vO «< PCR 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The rdpcr instruction returns in vO the contents of the PCR internal processor register (the base 
address value of the processor control region). 
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5.1.16 Read the Current Processor Status Register (PSR) 


Format: 


rdpsr ! PALcode format 


Operation: 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
vO < PSR 
GPR State Change: 


v0. — PSR 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The rdpsr instruction returns in vO the contents of the current PSR (Processor Status Register) 
internal processor register. 
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5.1.17 Read the Current Internal Processor State 


Format: 

rdstate ! PALcode format 
Operation: 

{ a0 = Pointer to 32-bit superpage address of state record buffer. } 


{ Address must be quadword aligned } 
{ al = Length of buffer in bytes } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
- { Dump internal processor state record to processor state buffer } 
vO << status 


GPR State Change: 
‘vO © status 
a0—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Mlegal Instruction 
Machine Checks 


Description: 


The rdstate instruction writes the internal processor state to the internal processor state buffer 
pointed to by the address passed in the a0 register. The form and content of the internal proces- 
sor state buffer are implementation specific. | 


On return from the rdstate instruction, the return value register, v0, contains the status as 
follows: 


If vO 
If vO 


If vO > al No data is returned and vO is length of processor implementation 
counter record. 


H 
© 


Interface is not implemented. 
al v0Qis length of data returned. 


IA 
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5.1.18 Read the Thread Value for the Current Thread 
Format: 


rdthread ! PALcode format 


Operation: 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
v0 < THREAD 
GPR State Change: 


v0 — THREAD 


IPR State Change: 


None 


Exceptions: 
Illegal Instruction 
Machine Checks 


Description: 


The rdthread instruction returns in vO the contents of the THREAD internal processor register 
(for the currently executing thread). 
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5.1.19 Reboot — Transfer to Console Firmware 


Format: 


reboot ! PALcode format 


Operation: 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
RestartBlockPointer < PcRestartBlock(PCR ) 
If cannot verify restart block, restart previous PALcode } 
Save general register state in saved state area } 
Save internal processor register state in saved state area, } 
includes PAL BASE } 
Save implementation-specific data in saved state area } 
Set the saved state length in restart block } 
Compute and store Checksum for restart block } 
Restore previous privileged state } 
PAL BASE < previous PAL BASE. - 
RESTART ADDRESS <- PcFirmwareRestartAddress (PCR) 


ee ee 


GPR State Change: 
All registers are UNPREDICTABLE 


IPR State Change: | 
PAL_BASE ~< previous_PAL_BASE 
RESTART_ADDRESS ~< PcFirmwareRestartAddress(PCR) 
All other registers are UNPREDICTABLE 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The reboot instruction stops the operating system from executing and returns execution to the 
boot environment. Reboot is responsible for completing the ARC Restart Block before return- 
ing to the boot environment. The PALcode must accomplish two tasks to restore the boot 
environment: re-establish the boot environment PALcode and restart execution in the boot 
environment at the Firmware Restart Address. 
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5.1.20 Restart the Operating System from the Restart Block 
Format: 


restart ! PALcode format 


Operation: 
{ a0 = Pointer to ARC restart block with Alpha saved state area } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Verify restart block } 
{ if invalid then return to caller } 
RestartBlockPointer <- PcRestartBlock(PCR) 
{ Restore general register state from saved state area } 
{ Restore internal processor register state from saved state area, } 
{ Restore implementation-~specific data from saved state area } 
RESTART ADDRESS <~ RbRestartAddress(RestartBlockPointer) 


GPR State Change: 
All registers are UNPREDICTABLE 


IPR State Change: 
RESTART_ADDRESS < RbRestartAddress(RestartBlockPointer) 


All other registers are UNPREDICTABLE 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 
The restart instruction restores saved processor state and resumes execution of the operating 
system. 
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5.1.21 Return from System Service Call Exception 


Format: 

retsys ! PALcode format 
Operation: 

{ a0 = Previous PSR } 

{ al = New software interrupt requests } 

{ fp = Pointer to trap frame } 


{ v0 = Return status from system service } 


if ( PSR<MODE> EQ User ) then 


{ Initiate illegal instruction exception } 


_~ 


endif 

if ( al<l> EQ 1 ) then 
SIRR<DPC> ¢ 1 

endif 

if ( al<0> EQ 1 ) then 
SIRR<APC> < 1 

endif 

TrapFrame < fp 

ra ¢ TrintRa(TrapFrame) 

gp <¢ TrintGp(TrapFrame) 

fp < TrintFp(TrapFrame) 

sp < TrintSp(TrapFrame) 

RESTART ADDRESS <- TrFir(TrapFrame) 

PSR < a0 

{ Clear lock flag register } 

{ Clear intr flag register } 


GPR State Change: 
ra < TrIntRa(TrapFrame) 
gp < TrIntGp(TrapFrame) 
fp < TrIntFp(TrapFrame) 
sp < TrintSp(TrapFrame) 
at, t0-t12, a—a5 are UNPREDICTABLE 


IPR State Change: 
PSR < a0 
RESTART_ADDRESS < TrFir(TrapFrame) 
SIRR ¢ al<1...0> | 
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Exceptions: 
Illegal Instruction 
Machine Checks 
Invalid Kernel Stack 


Description: 

The retsys instruction returns from a system service call exception by unwinding the trap 
frame, clearing the lock_flag and intr_flag (interrupt flag) registers, and returning to the code 
stream that was executing when the original exception was initiated. Retsys must return to the 
native code stream; it is illegal for retsys to return to the PALcode environment and that must 
be guaranteed not to happen. In addition, retsys accepts a parameter to set software interrupt 
requests that became pending while the exception was handled. 


Retsys is similar to the rfe instruction, with the following exceptions: 
1. Retsys need not restore the argument registers a0—a3 from the trap frame. 


2. Retsys need not preserve volatile register state. 


3. Retsys returns to the address in the ra register at the point of the callsys rather than the 
faulting instruction address (the ra was written to the faulting instruction address by 
callsys). 
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5.1.22 Return from Exception or Interrupt 
Format: 


rfe ! PALcode format 


Operation: 


{ a0 = Previous PSR } 
{ al = New software interrupt requests } 
{ fp = Pointer to trap frame } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
if ( al<l> EQ 1 ) then 
SIRR<DPC> < 1 
endif 
if ( al<0> EQ 1 ) then 
SIRR<APC> < 1 
endif 


PSR < a0 
TrapFrame < fp 


a0 ¢ TrintA0(TrapFrame) 
al «¢ TriIntAl(TrapFrame) 
a2 <¢ TriIntA2(TrapFrame) 
a3 ¢ TriIntA3(TrapFrame) 
ra <¢ TrintRa(TrapFrame) 
gp <¢ TrintGp(TrapFrame) 
fp < TrintFp(TrapFrame) 
sp <¢ TrintSp(TrapFrame) 


RESTART ADDRESS < TrFir(TrapFrame) 


{ Clear lock_flag register } 


GPR State Change: 
aO < TrintAO(TrapFrame) 
al < TrintA1(TrapFrame) 
a2 < TrintA2(TrapFrame) 
a3 < TrintA3(TrapFrame) 
ra < TrintRa(TrapFrame) 
gp < TrintGp(TrapFrame) 
fp < TrintFp(TrapFrame) 
sp < TrintSp(TrapFrame) 
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IPR State Change: 
PSR < a0 
RESTART_ADDRESS < TrFir(TrapFrame) 
SIRR < al<1...0> 


Exceptions: 


Illegal Instruction 
Machine Checks 
Invalid Kernel Stack 


Description: 


The rfe instruction returns from exceptions or interrupts by unwinding the trap frame, clearing 
the lock_flag register, and returning to the code stream that was executing when the original 
exception or interrupt was initiated. Rfe must return to the native code stream; it is illegal for 
rfe to return to the PALcode environment and that must be guaranteed not to happen. In addi- 
tion, rfe accepts a parameter to set software interrupt requests that became pending while the 
event was handled. 
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5.1.23 Set Software Interrupt Request 


Format: 


ssir ! PALcode format 


Operation: 


{ a0 = Software interrupt requests to set } 


if ( PSR<MODE> EQ User ) then 
{Initiate illegal instruction exception } 
endif 
if ( ad<l> EQ 1 ) then 
SIRR<DPC> ¢<¢ 1 
endif 
if ( a0<0> EQ 1 ) then 
SIRR<APC> < 1 
endif 


GPR State Change: 
a—a5 are UNPREDICTABLE 


IPR State Change: 
SIRR < a0<1...0> 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The ssir instruction sets software interrupt requests by setting the appropriate bits in the SIRR 
internal processor register. See Section 4.2.7. 
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5.1.24 Swap Thread Context 
Format: 


swpctx ! PALcode format 


Operation: 
{ a0 = New initial kernel stack va } 

al = New thread address } 

a2 = New thread environment block pointer } 

a3 = New address space page frame number (PFN) } 
or a negative number } 

a4 = ASN } 

a5 = ASN wrap indicator } 


a ee 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
IKSP < al 
THREAD < al 


TEB < a2 
ASN wrap_indicator < a5 
if ( a3 GE 0 ) then ! swap address space 


temp <— SHIFT LEFT( a3, PAGE SHIFT ) 

PDR <¢- temp 

ASN ¢ a4 

if ( ASN wrap indicator NE 0 ) then 
{ invalidate all translations and virtual cache blocks } 
{ for which ASM EQ 0 } 

endif 

endif 


{ Where: } 
{ 2**PAGE SHIFT = implementation page size } 


GPR State Change: 
a0 a5 are UNPREDICTABLE 


IPR State Change: 
IKSP < a0 
THREAD ~< al 
TEB ¢ a2 


ASN < a4 (possibly) 
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Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The swpctx instruction swaps the privileged portions of thread context. Thread context is 
swapped by establishing the new IKSP, THREAD, and TEB internal processor register values. 


Swpctx may also swap the address space (or process) for the new thread. If the new thread is 
in the same process (address space) as the previous thread, the kernel passes a negative value 
for the page frame number (PFN) in the page directory page, indicating that the address space 
need not be switched. If the PFN is zero or a positive number, it is used to swap the address 
space, just as if swpprocess had been executed. 
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5.1.25 Swap the Current IRQL (Interrupt Request Level) 
Format: 


swpirql ! PALcode format 
Operation: 
{ a0 = New IROL } 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 


v0 «© PSR<IROL> 
PSR<IROL> «< a0 


GPR State Change: 
v0 — PSR<IRQL> 
a0—a5 are UNPREDICTABLE 


IPR State Change: 
PSR<IROQL> < a0 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The swpirq] instruction swaps the current IRQL field in the PSR internal processor register for 
the specified new IRQL, setting the processor so that only interrupts permitted by the new 
IRQL are enabled. Swpirql updates the IRQL field and returns in vO the previous IRQL. 
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5.1.26 Swap the Initial Kernel Stack Pointer (IKSP) for the Current Thread 
Format: 


swpksp ! PALcode format 
Operation: 
{ a0 = New IKSP } 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 


vO ¢ IKSP 
IKSP ¢ a0 


GPR State Change: 
vO « IKSP 
a0-a5 are UNPREDICTABLE 


IPR State Change: 
IKSP <— a0 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The swpksp instruction returns in vO the value of the previous IKSP internal processor register 
and writes a new IKSP for the currently executing thread. 
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5.1.27 Swap the Currently Executing PALcode 


Format: 


swppal ! PALcode format 


Operation: 
{ a0 = Physical base address of new PALcode } 
{ al-a5 = Arguments to the new PALcode environment } 


if ( PSR<MODE> EQ User ) then 

{ Initiate illegal instruction exception } 
endif 
{ load processor-dependent parameters } 
{ jump to address in a0 as a physical address in } 
{ the PALcode environment } 


GPR State Change: 
at and t0-t12 are UNPREDICTABLE or contain processor-dependent parameters 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The swppal instruction swaps the currently executing PALcode by transferring to the base 
address of the new PALcode image (provided in a0) in the PALcode environment. 
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5.1.28 Swap Process Context (Swap Address Space) 


Format: 

swpprocess ! PALcode format 
Operation: 

{ a0 = Page frame number (PFN) of new PDR } 

{ al = Address space number (ASN) of new process } 

{ a2 = Address space number wrap indicator (ASN wrap indicator): } 

{ zerp = no wrap } 

{ nonzero = wrap } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
temp «© SHIFT LEFT( a0, PAGE SHIFT ) 
PDR < temp 
ASN <« al 
if ( ASN wrap indicator NE 0 ) then 
{ Invalidate all translations and virtual cache blocks } 
{ for which ASM EQ 0 } 
endif , 


{ Where: } 
{ 2**PAGE SHIFT = implementation page size } 


GPR State Change: 
a0-—a5 are UNPREDICTABLE 


IPR State Change: 
PDR < a0 
ASN <€ al 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The swpprocess instruction swaps the privileged process context by changing the address 
space for the currently executing thread. The address space change is accomplished by estab- 
lishing anew PDR and ASN. If the ASN_wrap_indicator passed in a2 is nonzero, swpprocess 
causes invalidation of all translation buffer entries and virtual cache blocks that have a clear 
address space match (ASM) bit. 
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5.1.29 Translation Buffer Invalidate All 


Format: 
tbia . ! PALcode format 


Operation: 


if ( PSR<MODE> EQ User ) then 

{ Initiate illegal instruction exception } 
endif 
{ Invalidate all translations and virtual cache blocks } 
{ within the processor } 


GPR State Change: 
a—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 
Illegal Instruction 
Machine Checks 


Description: 


The tbia instruction invalidates all translations and virtual cache blocks within the processor. 
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5.1.30 Translation Buffer Invalidate Multiple 


Format: 

tbim ! PALcode format 
Operation: 

{ a0 = Pointer to array of virtual addresses to invalidate } 


{ al = Number of virtual addresses to invalidate } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Invalidate translations for virtual addresses pointed to in a0 for } 
{ the number of entries in al. Invalidate in all translation } 
{ buffers and all virtual caches } . 


GPR State Change: 
aQ—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 

The tbim instruction invalidates multiple virtual translations for the current ASN. The transla- 
tions for the virtual address must be invalidated in all processor translation buffers and virtual 
caches. 
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5.1.31 Translation Buffer Invalidate Multiple for ASN 


Format: 
tbimasn ! PALcode format 
Operation: 
* { a0 = Pointer to array of virtual addresss to invalidate } 
{ al = Number of virtual addesses to invalidate } 


{ a2 = Address space number (ASN) } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 
{ Invalidate translations for the virtual addresses in the array } 
{ pointed to in a0, for the number of entries in al, that match the } 
{ ASN in a2. Invalidate in all translation buffers and virtual caches } 


GPR State Change: 
aQ—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The tbimasn instruction invalidates multiple virtual translations for a specified ASN. The trans- 
lations for the virtual addresses must he invalidated in all processor translation buffe 


virtual caches. 


DIGITAL Restricted Distribution 
5-38 Windows NT Alpha Software (II-C) . 


5.1.32 Translation Buffer Invalidate Single 


Format: 
tbis ! PALcode format 


Operation: | 
{ a0 = Virtual address of translation to invalidate } 
if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 


{ Invalidate all translations for the virtual address in a0, } 
{ invalidate in all translation buffers and all virtual caches } 


GPR State Change: 
a0—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The tbis instruction invalidates a single virtual translation. The translation for the passed vir- 
tual address must be invalidated in all processor translation buffers and virtual caches. 
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5.1.33 Translation Buffer Invalidate Single for ASN 


Format: 


tbisasn . ! PALcode format 


Operation: 
{ a0 = Virtual address of translation to invalidate } 
{ al = Address space number (ASN) } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif . 
{ Invalidate the translation for the virtual address in a0 } 
{ that matches the ASN in al. The translation must be invalidated } 
{ in all translation buffers and virtual caches } 


GPR State Change: 
a0—a5 are UNPREDICTABLE 


IPR State Change: 


None 


’ Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The tbisasn instruction invalidates a single virtual translation for a specified address space 
number. The translation for the passed virtual address must be invalidated in all processor 
translation buffers and virtual caches. 
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5.1.34 Write Kernel Exception Entry Routine 
Format: 


wrentry ! PALcode format 


Operation: 
{ a0 = Address of exception entry routine, 32-bit } 
{ superpage address } 
{ al = Exception class value } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 


endif 
case al begin 
0: 
PANIC ENTRY <¢ a0 
break; 
1: 
MEM MGMT ENTRY < a0 
break; 
28 
INTERRUPT ENTRY <— ad 
break; 
3 
SYSCALL ENTRY < a0 
break; 
4: 
GENERAL ENTRY < a0 
break; 
otherwise: 
{ Initiate panic exception } 
endcase; 
GPR State Change: 


a0—a5 are UNPREDICTABLE 


IPR State Change: 
* ENTRY < a0 


Exceptions: 


Illegal Instruction 
Machine Checks 
Panic Exception 
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Description: 


The wrentry instruction provides the registry of exception handling routines for the exception 
classes. The address in a0 is registered for the exception class corresponding to the exception 
class value in al. The kernel must use wrentry to register an exception handler for each of the 
exception classes. The relationship between the exception classes and class values is shown in 
Table 5-2. 


Table 5-2: Exception Class Values 


Exception Class Value 


Panic exceptions 0 
Memory management exceptions 
Interrupt exceptions 


System service call exceptions 


kh WOW NO — 


General exceptions 
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5.1.35 Write the Machine Check Error Summary Register 
Format: 


wrmces ! PALcode format 


Operation: 


{a0 = New values for the machine check error } 
{ summary (MCES) register. } 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 

endif 

v0 <¢ MCES 

MCES<DMK> < a0<5> 

MCES<DSC> < a0Q<4> 

MCES<DPC> < aQ<3> 

if ( a0<2> EQ 1 ) then 
MCES<PCE> < 0 

endif 

if ( a0<l> EQ 1 ) then 
MCES<SCE> < 0 

endif 

if( a0<0> EQ 1 ) then 
MCES<MCK> < 0 

endif 


GPR State Change: 
vO — previous MCES 


IPR State Change: 
MCES < a0 


Exceptions: 


Illegal Instruction 
‘Machine Checks 


Description: 


The wrmces instruction writes new values for the MCES internal processor register and 
returns in vO the previous values of that register. 
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5.1.36 Write Performance Counter Interrupt Control Information 


Format: 


wrperfmon | ! PALcode format 


Operation: 


if ( PSR<MODE> EQ User ) then 
{ Initiate illegal instruction exception } 
endif 


{ a0 - a5 contain implementation-specific input values } 


GPR State Change: 
vO < implementation-dependent value 
a0—a5 are UNPREDICTABLE 


IPR State Change: 


None 


Exceptions: 


Illegal Instruction 
Machine Checks 


Description: 


The wrperfmon instruction controls any performance monitoring mechanisms in the processor 
and PALcode. The wrperfmon instruction arguments and actions are chip dependent, and 
when defined for an implementation, are described in Appendix E. 
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5.2 Unprivileged PALcode Instructions 


Table 5-3: Unprivileged PALcode Instruction Summary 


Mnemonic Description 

bpt Breakpoint trap 

callkd Call kernel debugger 

callsys Call system service 

gentrap Generate trap 

imb Instruction memory barrier 

kbpt Kernel breakpoint trap 

rdteb Read thread environment block pointer 
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5.2.1 Breakpoint Trap (Standard User-Mode Breakpoint) 


Format: 


bpt ! PALcode format 


Operation: 
See Sections 4.1.7.8 and 4.1.7.6 


GPR State Change: 
See Sections 4.1.7.8 and 4.1.7.6 


IPR State Change: 
See Sections 4.1.7.8 and 4.1.7.6 


Exceptions: 


Machine Checks 
Kernel Stack Invalid 


Description: 


The bpt instruction raises a breakpoint general exception to the kernel, setting a 
USER_BREAKPOINT breakpoint type. 
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5.2.2 Call Kernel Debugger 


Format: 


callkd ! PALcode format 


Operation: 
{v0 = Type of breakpoint } 
See Sections 4.1.7.8 and 4.1.7.6 


GPR State Change: 
See Sections 4.1.7.8 and 4.1.7.6 


IPR State Change: 
See Sections 4.1.7.8 and 4.1.7.6 


Exceptions: 


Machine Checks 
Kernel Stack Invalid 


Description: 


The callkd instruction raises a breakpoint general exception to the kernel, setting the break- 
point type with the value supplied in vO. The callkd instruction implements special calls to the 
kernel debugger. 
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5.2.3 System Service Call 
Format: 


callsys ! PALcode format 


Operation: 


{ vO = System service code } 

{ a0-a5 = System call arguments } 

previousPSR < PSR 

if( PSR<MODE> EQ UserMode ) then 
PSR<MODE> < KernelMode 


tp < (IKSP - TrapFrameLength) ! Establish trap pointer 
else 
tp <¢ (sp - TrapFrameLength) ! Establish trap pointer 
endif 
TrintSp(tp) < sp 
TrIntFp(tp) <« fp 
TrIntRa(tp) <¢ ra 
TrIntGp(tp) < gp 
TrFir(tp) « ra 
TrPsr(tp) « previousPSR 
gp < KGP 
sp < tp 
fp <« tp 


tO < previousPSR<MODE> 
tl < THREAD 
RESTART ADDRESS. <- SYSCALL ENTRY 


GPR State Change: 
fp < tp 
gp < KGP 
sp «tp 
tO < PSR 
tl — THREAD 
at and t0-t12 are UNPREDICTABLE 


IPR State Change: 
PSR<MODE> <— KernelMode 
RESTART_ADDRESS «— SYSCALL_ENTRY 


Exceptions: 


Machine Checks 
Kernel Stack Invalid 
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Description: 


The callsys instruction raises a system service call exception to the kernel. The system service 
call has the software semantics of a standard procedure call. That is, arguments are passed in 
argument registers and on the stack, volatile registers are considered free, and nonvolatile reg- 
isters must be preserved across the call. In addition to the standard calling sequence, callsys is 
passed the number of the desired system service in the return value register vO. Callsys does 
not interpret this value, but rather passes it directly to the operating system. 


Callsys switches to kernel mode if necessary, builds a trap frame on the kernel stack, and then 
enters the kernel at the kernel system service exception handler. See Section 4.1.6. 


The argument registers must be preserved through the instruction. Standard control informa- 
tion, such as the previous PSR, is stored in the trap frame. Callsys then restarts execution at 
the kernel system service call exception entry, passing the previous mode as a parameter in the 
tO register, and the current thread as a parameter in the t1 register. 
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5.2.4 Generate a Trap 


Format: 
gentrap | ! PALcode format 


Operation: 
{ a0 = Trap number that identifies exception } 


See Sections 4.1.7.8 and 4.1.7.5 


GPR State Change: 
See Sections 4.1.7.8 and 4.1.7.5 


IPR State Change: 
See Sections 4.1.7.8 and 4.1.7.5 


Exceptions: 


Machine Checks 
Kernel Stack Invalid 


Description: 


The gentrap instruction generates a software general exception to the current thread. The 
exception code is generated from a trap number that is specified as an input parameter. Gen- 
trap is used to raise software-detected exceptions such as bound check errors or overflow 
conditions. 
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5.2.5 Instruction Memory Barrier 


Format: 


imb ! PALcode format 


Operation: 


{ From within kernel mode, make processor } 
{ instruction stream coherent with main memory } 


GPR State Change: 


None 


IPR State Change: 


None 


Exceptions: 


Machine Checks 


Description: 


The imb instruction may only be called from kernel mode and guarantees that all subsequent 
instruction stream fetches are coherent with respect to main memory on the current processor. 
Imb must be issued before executing code in memory that has been modified (either by stores 
from the processor or DMA from an I/O processor). See Common Architecture, Chapter 6. 


User-mode software must not use the imb instruction, but rather use the appropriate Windows 
NT interface to make the I-cache coherent. 
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5.2.6 Kernel Breakpoint Trap 


Format: 


kbpt ! PALcode format 


Operation: 
See Sections 4.1.7.8 and 4.1.7.6 


GPR State Change: 
See Sections 4.1.7.8 and 4.1.7.6 


IPR State Change: 
See Sections 4.1.7.8 and 4.1.7.6 


Exceptions: 


Machine Checks 
Kernel Stack Invalid 


Description: 


The kbpt instruction raises a breakpoint general exception to the kernel, setting a 
KERNEL_BREAKPOINT breakpoint type. 
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5.2.7 Read Thread Environment Block Pointer 


Format: 


rdteb ! PALcode format 


Operation: 
vO ¢ TEB 


GPR State Change: 
vO — TEB 


IPR State Change: 


None 


Exceptions: 


Machine Checks 


Description: 


The rdteb instruction returns in vO the contents of the TEB internal processor register for the 
currently executing thread (the base address of the thread environment block). See Section 2.7. 
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5.3. Debug PALcode and Free PALcode 


The debug PALcode is a functional superset of the production PALcode, which is specified in 
this document. The debug PALcode includes extra counters for performance evaluation and 
additional sanity checks. An unacceptable performance loss would occur if these features were 
implemented in production PALcode. Therefore, the debug PALcode is used in the laboratory 
only. 


The debug PALcode contains the following additional features: 
¢ Kernel stack underflow/overflow checking 
e Special I/O address checking 


e Event counters 


5.3.1 Kernel Stack Checking 


The debug PALcode checks for kernel stack undertlow and overtlow whenever it allocates a 
trap frame and the previous mode was kernel mode. Two pages of kernel stack are allocated 
for each thread. 


¢ Underflow occurs when the thread’s kernel mode stack pointer (SP) is greater than the 
initial kernel stack pointer (IKSP). 


e Overflow is detected whenever the SP would be less than (IKSP - 2 * PAGE SIZE). 


Kernel stack underflow and overflow are indicated with a panic exception, described in Sec- 
tion 4.1.8. 


Implementation Note: 


Alpha implementations that do not include the BWX extension (described in Appendix 
D) cannot provide direct access to I/O space addresses (as would Intel-based systems). 
Instead, those Alpha implementations provide access to I/O space by allowing the 
standard device drivers to use address handles, provided by the HAL, that may be treated 
as standard I/O virtual addresses for all operations except the I/O accesses. The I/O 


accesses must be performed by specialized routines in the HAL that are able to convert the 


address handles to the actual virtual addresses used for the I/O space accesses. 


By convention, the HAL uses the range of numbers A0000000;¢ through BFFFFFFF}¢ to 


represent these address handles whenever possible. This range of numbers falls into the 
upper half of the 32-bit superpage address range. The debug PALcode disables the 32-bit 
superpage in hardware and provides support for the lower half of the 32-bit superpage in 
PALcode (the range of addresses 80000000,¢6 through 9FFFFFFF)¢). Addresses in the 
range A0000000;¢ through BFFFFFFFj¢ are treated as standard addresses and, since they 
are not mapped, cause memory management faults (translation not valid). This support in 
the PALcode allows easy and precise trapping of device driver code that attempts to 


access I/O addresses directly, without using the intended access routines provided by the 
HAL. 
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Note: 
Physical system memory is limited to 512M bytes when running with the debug PALcode. 


5.3.2 Event Counters 


The debug PALcode provides software counters to count significant events within the PAL- 
code. The PALcode also provides the privileged rdcounters instruction to allow kernel-mode 
code to read the counters. The counted events are implementation specific but must include 
the following: a separate counter for each of the different PALcode instructions, TB miss 
counts, and interrupt counts. The format of the data returned by rdcounters is also implementa- 
tion specific. However, all counters must be 64-bit counters. 
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5.4 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP —> Alpha 
Windows NT AXP —> Windows NT Alpha 
Updated for NT 4.0 
Added TBIM and TBIMASN 


oe a, ake 


Revision 6.0, December 12, 1994 
1. Created from Windows NT Alpha specifications\ 
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Chapter 6 


Initialization and Firmware Transitions (II-—C) 


This chapter describes the four phases of PALcode environment initialization and the PAL- 
code functions that provide the transition between the operating system and the firmware. 


6.1 Initialization 


From the perspective of the PALcode environment there are four phases of initialization: 
1. Internal system-specific processor state is established before the PALcode runs. 
2. PALcode initializes the internal processor state. 


3. The kernel uses PALcode initialization callback instructions to prepare the PALcode to 
handle exceptions. 


4. Interrupt tables are initialized so that standard interrupt support can be used. 


6.1.1 Pre-PALcode Initialization 


Firmware must set the processor and system to a known good state before the PALcode entry 
point is called. The firmware must initialize any internal processor registers that contain sys- 
tem-specific parameters such as timing or memory size information. This is necessary because 
the PALcode is entirely independent of the system. The firmware must ensure that all caches 
are coherent with main memory before calling the PALcode and that the memory system has 
been fully initialized. 


Hardware Implementation Note: 


If system configuration information is written to write-only IPRs, those configuration 
IPRs cannot have any control bits that need to be written by the platform-independent 
operating system PALcode. If such bits were written in that manner, the firmware would 
have to pass the configuration information in internal processor state on a 
per-implementation basis. Hardware designers should consider allowing configuration 
registers to be read as well as written to allow the platform-independent layer to have 
visibility to the full internal processor state. 


6.1.2 PALcode Initialization 


The PALcode is entered at the first instruction at the base of the PALcode image. PALcode is 
called with the page frame number (PEN) of the PCR as a parameter in al. All other argument 
registers must be preserved across PALcode initalization and are considered parameters to the 
operating system and are not interpreted by the PALcode. That is, the PALcode is free to 
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destroy volatile general-purpose integer and floating-point registers, but must preserve the non- 
volatile register state across the call. Register volatility is listed in Section 1.2. The PALcode 
must accomplish the following initialization: 


1. Deassert all interrupt requests and disable all interrupt enables (this includes software, 
hardware and asynchronous trap interrupts). 


2. Set the processor status register (PSR) such that interrupts are enabled, interrupt request 
level is set to high level (7), and the mode is kernel. 


Invalidate all virtual translation buffers. 


ia 


Establish all required superpage mapping: 32-bit I-stream and D-stream, and 43-bit 
D-stream mapping. 


Set the previous_PAL_BASE register to the previous value of the PAL_BASE register. 
Set the PAL_BASE register to the base address of the PALcode image. 


Set the interrupt level table so that no interrupts are enabled for all interrupt levels. 


OP St 


Initialize all architected internal processor registers to their specified initialization val- 
ues. 


9. Begin any required implementation-specific initialization, such as unlocking error reg- 
isters. 


When the PALcode has completed its initialization, it resumes execution at the address passed 
in the ra (return address) register. 


6.1.3 Kernel Callback Initialization of PALcode 


The kernel uses the initpal and wrentry instructions to call back into the PALcode with the ini- 
tialization values that allow exceptions to be handled properly between the PALcode and the 
Kernel. 


The kernel uses initpal to establish system-permanent context and per-thread context for the 
initialization thread. The system-permanent context passed to initpal is the kernel global 
pointer (KGP), which is passed via the gp register. 


ot tha all 1 
The initialization thread data passed in initpal are the page directory page, the initial kernel 


stack pointer, and the initialization thread address. The page directory page and thread address 
are passed as standard parameters; the kernel stack pointer is passed in the sp register. The init- 
pal instruction also initializes the PALcode information section of the processor control region. 


The kernel uses wrentry to register the kernel exception entry points with the PALcode. The 
wrentry instruction is called once for each kernel exception entry point. Each call includes the 
exception entry point address and the number of the exception class it handles. 
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6.1.4 Interrupt Table Initialization 


The interrupt table values in the processor control region are system specific and so are not ini- 
tialized until HAL initialization. Until these tables are initialized, the PALcode uses interrupt 
tables that are initialized such that all interrupts are disabled. An implementation may choose 
to cach some portion of the interrupt tables within the processor. After the operating system 
has established the interrupt tables, an implementation may use the initpcr instruction to cache 
some part of those tables. 


6.2 Firmware Interfaces 


The firmware PALcode environment is decoupled from the operating system PALcode. The 
reboot/restart and swppal instructions permit the transition between the operating system and 
the firmware PALcode context. 


6.2.1 Reboot Instruction — Transition to Firmware PALcode Context 


The reboot instruction performs a controlled transition to the firmware PALcode context. 
Reboot essentialiy follows the semantics for a return to the ARC (Advanced RISC Comput- 
ing) firmware environment, with the addition of Alpha support for switching to the firmware 
PALcode. The reboot function accomplishes the following tasks: 


1. Retrieves the restart block pointer from the processor control region. 


The restart block is expected to be initialized by the firmware. The pointer to the 
restart block is passed by the firmware through the OS Loader to the kernel in the 
loader parameter block. The kernel writes the restart block pointer into the processor 
control region during startup. The restart block pointer must be a 32-bit superpage 
address. 


The firmware environment is responsible for allocating memory for the entire restart 
block, including the saved state area that is specific to the Alpha architecture. The 
firmware is also responsible for initializing the restart block, as specified by ARC. 


2. Verifies the restart block and if invalid, initiates alternate restart. 


The PALcode verifies the restart block by ensuring that the restart block signature is 
valid and that the restart block and saved state area lengths are of sufficient size to 
contain the state the PALcode saves. If the PALcode determines that the restart block 
is not valid, an alternate restart is initiated. 


The alternate restart allows the PALcode to restore the previous PALcode base to the 
PAL_BASE register and to transfer control to the previous PALcode base in the 


PALcode environment. 


Figure 6—1 shows the structure of the PAL_BASE register. 
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Figure 6-1: PAL_BASE Internal Processor Register 


31 PA_BITS..K K-1..0 
ADDR RAZ 


The hardware vectors into the appropriate PALcode handlers as offsets from the base 
in the PAL_BASE register. The offsets for each handler and the type of handler are 
implementation specific, except for the reset vector. The reset vector is the PALcode 
initialization vector and must begin at offset 0 within the PALcode image. 


Explicitly, PAL_BASE contains the value <PA_BITS..K>, where PA_BITS is the 
physical address bits for the implementation, and 2**K is the minimum PALcode byte 
alignment for the implementation. 


Note that the OS Loader uses 64K-byte boundaries, so the maximum value for K is 
16. The minimum value for K is N, where 2**N = implementation page size. 


3. Saves the general register state in the restart block. 


The saved general register state includes all 32 integer registers and all 32 
floating-point registers. In addition, the floating-point control register is also saved. 


4. Saves the architected internal processor register state in the restart block. 


The internal processor register state is stored in its architected format so that it may be 
interpreted in the firmware environment. In addition, remaining space is allocated so 
that the total size of the restart block is 2040 bytes. The additional space can be used 
for per-implementation data. 


5. Saves the RESTART_ADDRESS in the restart block. 


The RESTART_ADDRESS is stored in the saved state area to allow return from 
reboot via the restart instruction. The HAL is responsible for populating the Version, 
Revision, and RestartAddress fields of the restart block header. 


6. Retrieves the firmware restart address from the processor control region. 


The firmware restart address is the address to which the PALcode transfers control 
upon completion of the reboot. The firmware restart address is passed from the 
firmware through the OS Loader to the kernel and stored in the processor control 
region as is the restart block pointer. The firmware restart address is read from the 
processor control region and written to the RESTART_ADDRESS register with 
implementation-specific (but well-defined) interpretation. 


7. Restores the PALcode base from the previous PALcode base. 


The PALcode captures the previous PALcode environment when it is first initialized. 
The PALcode base address is read from the PAL_BASE register and written to the 
previous_PAL_BASE register. When the processor executes the reboot function, it 
restores the previous PALcode environment by writing the value in the 
previous_PAL_BASE register to the PAL_BASE register. 
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8. 
9. 


Hardware Implementation Note: 
Several restrictions are imposed on the hardware design to support this model for 
switching PALcode environments: 


A. The currently active PALcode must be settable by writing the base address of 
the PALcode image to an internal processor register. 


B. No implementation can require, for the base of the PALcode, an alignment of 
greater than 64K bytes or less than the implementation page size. 


C. The internal processor register used to set the base of the PALcode must be 
readable for each bit that is writeable. 


Completes the restart block by updating the boot status and the checksum. 


Restarts execution at the firmware restart address passing a pointer to the restart block 
in the aO register. 


The restart instruction is provided to reverse the work done by a reboot instruction and allows 
the processor to restart execution. The restart function performs the inverse of the tasks that 
were performed in the reboot. 


6.2.2 Reboot and Restart Tasks and Sequence 


The tasks and sequence required for performing a reboot and restart are described below: 


1. 


10. 


11. 


12. 


Firmware allocates restart block, initializing signature, length, ID fields, and the pointer | 
to next restart block. Restart block pointer and firmware restart address are passed to 
the kernel. 


HAL populates the Version and Revision fields during HAL initialization. 
Some external event triggers a halt, a reboot, or a power-fail. 


The appropriate HAL routine populates the RestartAddress field of the restart block 
with the address of the HAL restart routine. 


The HAL executes the reboot instruction. 


The PALcode saves processor state, including the RESTART_ADDRESS register (the 
address in the HAL of the instruction after the reboot instruction). 


The PALcode transfers to the firmware environment. 


The firmware initializes a restart by calling the HAL restart routine (via the address in 
the restart block header). 


The HAL uses the swppal instruction to restore the operating system PALcode environ- 
ment. 


The HAL uses the restart instruction to restore complete processor state. 


The PALcode restores state and then returns execution to the instruction after the reboot 
instruction in the HAL. 


The HAL completes the restart. 
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6.2.3 Swppal Instruction — Transition to Any PALcode Environment 


The swppal instruction is a flexible interface that allows kernel code to transition to any PAL- 


code environment, as contrasted with reboot, which limits the caller to transition to the 
previous PALcode environment. 
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6.3 \Revision History 


Revision 7.0, November 10, 1997 

1. Alpha AXP —> Alpha 

2. Windows NT AXP —> Windows NT Alpha 
3. Updated for NT 4.0 
4 


Added initpcr information 


Revision 6.0, December 12, 1994 
1. Created from Windows NT AXP specifications\ 
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Windows NT Alpha Software Index 


A 


Access violation fault, 4—3 
Address space, 3-1 


Address space match (ASM) 
bit in PTE, 3-5 
with context switch, 2-10, 5—35 
Address space number (ASN) register, 2—4 
with context switch, 2—10 
Address translation, page table structure, 3-2 


APC_LEVEL, IRQL table index name, 2-2 
ARC Restart Block, 5—23 


Arithmetic exceptions, 4—5 





Arithmetic traps, 4-5 
division by zero, 4-6 
inexact result, 4—6 
integer overflow, 4-6 
invalid operation, 4-6 
overflow, 4-6 
underflow, 4—6 


ASN_wrap_indicator, 2—10 


Asynchronous procedure call (APC) 
SIRR register field for, 4-16 


Boot environment, restoring, 5—23 
Boot sequence, establishing, 1-2 
bpt (PALcode) instruction, 5-46 
Breakpoint exceptions, 4-8 


C 


Cache blocks, virtual 
invalidating all, 5—36 
invalidating multiple, 5—37 
invalidating single, 5-39 

Cache coherency, 2-8 
HAL interface for, 1-3 


callkd (PALcode) instruction, 5—47 
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callsys (PALcode) instruction, 5—48 
Catastrophic errors, 4—19 

CLOCK_HIGH, IRQL table index name, 2-2 
Console firmware, transferring to, 5—23 


Context switching 


between address spaces, 5-35 
PDR register with, 3-3 
thread, 5-30 

thread to process, 2—10 
thread to thread, 2-10 


Conventions, code flows, 1-4 


csir (PALcode) instruction, 5—4 
clears software interrupts, 4-16 


D 


dalnfix (PALcode) instruction, 5-5, 5—9 





-DATA_BUS_ERROR code, 4-18 


Deferred procedure call (DPC) 


SIRR register field for, 4-16 
stack for, 2-9 


DEVICE_HIGH_LEVEL, IRQL table index name, 
2-2 

DEVICE_LEVEL, IRQL table index name, 2-2 

di (PALcode) instruction, 5—6 

Dirty pages, tracking, 3-5 

DISPATCH_LEVEL, IRQL table index name, 2-2 

Division by zero bit, exception summary register, 
4-6 

Division by zero trap, 4-6 

DMA control, HAL interface for, 1-3 

DMkK bit, machine check error summary register, 
4-18 

DPC bit, machine check error summary register, 
4-18 

draina (PALcode) instruction, 5-7 

with machine checks, 4—18 
DSC bit, machine check error summary register, 
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4-18 
dtbis (PALcode) instruction, 3—5, 5-8 


DZE bit, exception summary register, 4—6 


E 


ealnfix (PALcode) instruction, 5-9 


ei (PALcode) instruction, 5—10 
as synchronization function, 4-15 
Errors, correctable, 4—16 


Errors, uncorrectable, 4—17 
Exception classes, 4—1 
registry of handling routines for, 5-41 
values for, 5-42 
Exception dispatch, 4-1 
Exception handling routines, registery for, 5-41 
Dennntian quarmnaexr eaniatar A_& 
Awe pruyis OULLILIGaL Yy AV EIOW1 TV 
EXCEPTION_SUMMARY, 4-6 
ExceptionPC address, 4—5 


Exceptions 
arithmetic, 4—5 
breakpoint, 4-8 
general class common dispatch, 4-10 
general class of, 4-4 
illegal instruction, 4-7 
initializing entry points, 6—2 
invalid address, 4—7 
memory management class, 4-3 
returning from, 4—2, 5-27 
software, 4-8 
subsetted IEEE, 4-9 
system service calls, 4—4 
trap frames with, 4-3 
unaligned access, 4-7 


F 


Fault on write (FOW), bitin PTE, 3-5 
Firmware components, 1-2 

Firmware restart, 2—7 

Firmware restart address, 6—4 
FLOAT_REGISTER_MASK, 4-5 


G 


General class exceptions, 4—4 
common dispatch of, 4-10 
Generai exception address (GENERAL_ENTRY) 
register, 2—4 





gentrap (PALcode) instruction, 5-50 
raises software exceptions, 4-8 
Global translation hint, 3-5 
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Granularity hint (GH), bits in PTE, 3-5 


H 


HAL (Hardware abstraction layer), 1-3 


halt (PALcode) instruction, 5-11 


writes PAL_BASE register, 2-5 
See also reboot (PALcode) instruction 


Hardware abstraction layer, interfaces for, 1-3 
Hardware errors, when unrecoverable, 4—10 
Hardware interrupts, 4-13 

HIGH_LEVEL, IRQL table index name, 2-2 


I/O access, nonmapped, 3-1 
I/O support, HAL interface for, 1-3 
IEEE, subsetted instruction exception, 4-9 
IKSP register. See Kernel stack pointer, initial 
Illegal instruction exceptions, 4—7 
imb (PALcode) instruction, 5-51 
INE bit, exception summary register, 4-6 
Inexact result bit, exception summary register, 4—6 
Inexact result trap, 4—6 
Initialization, PALcode environment, 6-1 
initpal (PALcode) instruction, 5-12, 5-14 
at initialization, 6—2 
reads PAL_BASE register, 2-5 
writes KGP register, 2—4 


writes PCR register, 2-5 
writes PDR register, 2-6 


initpcr (PALcode) instruction, 5—14 

Integer overflow bit, exception summary register, 
4-6 

Integer overflow trap, 4—6 

INTEGER_REGISTER_MASK, 4-6 


Internal processor registers (IPR) 


address space number, 2—4 

general exception address, 2—4 
interrupt exception address, 2-4 
kernel global pointer, 2-4 

kernel stack pointer (IKSP), initial, 2-4 
machine check error summary, 2-5 
memory management exception, 2-5 
page directory base, 2-6 

PALcode image base address, 2-5 
panic exception, 2-5 

process control region base, 2-5 
processor status, 2-6 

restart execution address, 2-6 
returning state of, 5-21 

software interrupt request, 2-6 
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summarized, 2~—2 

system service exception address, 2-6 
thread environment block base, 2-6 
thread unique value, 2-7 


Interrupt acknowledge, 4-15 
Interrupt dispatch 


example, 4-13 
table (IDT), 4-13 


Interrupt enable mask, 4—12 
Interrupt exception address INTERRUPT_ENTRY) 
register, 2-4 

Interrupt handling 

HAL interface for, 1-3 
Interrupt level table ILT), 4-12 

index values/names for, 2-2 
Interrupt mask table IMT), 4-12 
Interrupt request levels URQL) 

ILT table for, 4-12 

in PSR, 2-1 

PSR and di instruction, 5—6 

swapping, 5-32 
Interrupt tables IDT, ILT, IMT), 2-7 
Interrupt tables, at initialization, 6-3 
Interrupt trap frame, building, 4-14 
Interrupt vectors,mask table for, 4—12 
Interrupts, 4—12 

disabling, 5-6 

enabling, 5-10 

processor status register and, 2-1 


returning from, 5-27 
software requests for, 4—15 


intr_flag register 


cleared by retsys, 5-26 
cleared by rfe, 5-28 


INV bit, exception summary register, 4—6 
Invalid address exceptions, 4—7 


Invalid operation bit, exception summary register, 
4-6 


Invalid operation trap, 4—6 

IOV bit, exception summary register, 4—6 
IPI_LEVEL, IRQL table index name, 2-2 
IRQL 


See Interrupt request levels 
See also rdirql and swpirql 


K 


kbpt (PALcode) instruction, 5-52 


Kernel global pointer (KGP) register, 2—4 
at initialization, 6—2 
initializing, 5-12 

Kernel stack, 2-9 
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under/overflow detection, 5-54 
Kernel stack pointer (IKSP), initial, 2-4 

initializing, 5-12 

returning contents of, 5-17 

swapping to current, 5-33 

with context switch, 2-10, 5-31 

with trap frames, 4-3 


Kernel stack, when corrupted, 4-10 
KERNEL_BREAKPOINT breakpoint type, 4-9 


L 


lock_flag register 
cleared by retsys, 5-26 
cleared by rfe, 5-28 


Machine check error handling, 4-17 





Machine check error summary (MCES) register, 2—5 
format of, 4—17 
returning contents of, 5-18 
writing values to, 5-43 
Machine checks 
catastrophic conditions with, 4—19 
classes of, 4-16 
disabling during debug, 4—-18 
sources for, 4-16 
type codes, 4-18 
unrecoverable reported, 4-18 
MCK bit, machine check error summary register, 
4-18 
Memory management, 3-1 


Memory management exception 
(MEM_MGMT._ENTRYY) register, 2—5 


N 





- Nonmapped address space, 3-1 


O 


OS Loader, 1-2 

Overflow bit, exception summary register, 4—6 
Overflow trap, 4-6 . 

OVF bit, exception summary register, 4—6 


p 


Page directory base (PDR) register, 2-6 
initializing, 5-12 
maps PTEs, 3-3 
with context switch, 5-35 

Page directory entry (PDE), 3-3 
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Page frame number (PFN) 
bits in PTE, 3-4 
in PTE, 3-2 
when aPDR, 3-3 
with context switch, 2-10, 5-31 


Page table entry (PTE) 


page frame number (PFN) with, 3-2 
summary of, 3-4 


Page tables 


physical traversal algorithm, 3-3 
traversing, 3-3 


PALcode 


argument registers used, 5-1 
debugging, 5-54 

event counters during debug, 5-55 
initial processor context for, 6—2 
initializing environment for, 6—1 
internal software registers, 5—15 
kernel activates, 1-2 

OS Loader and, 1-2 

swapping currently executing, 5-34 
unexpected exceptions in, 4—11 
version control, 2-7 


PALcode image base address (PAL_BASE) register, 
2-5 
from initpal, 5-12 
previous, 6—4 
structure of, 6-3 


PALcode instructions 
Windows NT Alpha privileged (list), 5—2 
Windows NT Alpha unprivileged (list), 5-45 
PALcode instructions, Windows NT Alpha privileged 


clear software interrupt request, 5—4 

data TB invalidate single, 5-8 

disable alignment fixups, 5-5, 5-9 

disable all interrupts, 5-6 

drain all aborts, 5—7 

enable alignment fixups, 5-9 

enable interrupts, 5—10 

halt operating system, 5—11 

initialize PALcode data structures, 5—12, 5-14 

initialize processor control region data, 5-14 

read current IRQL, 5-16 

read initial kernel stack pointer, 5—17 

read internal processor state, 5-21 

read machine check error summary register, 

read processor (PSR) status register, 5-20 

read processor control region base address, 
5-19 

read software event counters, 5-15 

read thread value, 5-22 

restart operating system, 5—24 

return from exception or interrupt, 5-27 

return from system service cali exception, 5-25 

set software interrupt request, 5—29 

swap current IRQL, 5-32 

swap current PALcode, 5-34 ; 

swap initial kernel stack pointer, 5-33 

swap process context, 5—35 

swap thread context, 5-30 
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transfer to console firmware, 5—23 

translation buffer invalidate all, 5-36 

translation buffer invalidate multiple, 5—37 

translation buffer invalidate multiple for ASN, 
5-38 

translation buffer invalidate single, 5—39 

translation buffer invalidate single for ASN, 


write kernel exception entry routine, 5-41 

write machine check error summary register, 

5—43 

write performance monitor, 5—44 
PALcode instructions, Windows NT Alpha 

unprivileged 

breakpoint trap, 5—46 

call kernel debugger, 5—47 

generate a trap, 5-50 

instruction memory barrier, 5-51 

kernel breakpoint trap, 5-52 

read TEB pointer, 5-53 

system service call, 5-48 


Panic exception (PANIC_ENTRY) register, 2-5 


Panic exceptions, 4-10 


kernel stack under/overflow, 5-54 
trap from and dispatch for, 4-11 


Panic stack, 2-9 
Panic stack pointer, 2-7 
PANIC_STACK_SWITCH code, 4-10 
PASSIVE_LEVEL, IRQL table index name, 2-2 
PCE bit, machine check error summary register, 
4-18 
Physical address space, 3—2 
Physical address translation, 3-2 
Pre-PALcode initialization, 6—1 
Privileges, processor, 2-2 
Process control region base (PCR) register, 2-5 
Processor control block (PRCB) 
at initialization, 6—2 
Processor control region, 2-7 
interrupt tables with, 2-7 
Processor control region base (PCR) register 
at initialization, 6—2 
initializing, 5—12 
returning contents of, 5-19 
Processor correctable errors, 4—16 
reporting, 4-18 
Processor data areas, 2—7 
Processor modes, 2-1 
Processor state, internal, initialized, 6—1 
Processor status (PSR) register, 2—1, 2-6 
returning contents of, 5—20 
Processor uncorrectable errors, 4—17 
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PSR. See Processor status register 


R 


rdcounters (PALcode) instruction, 5-15 
rdirq] (PALcode) instruction, 5—16 


rdksp (PALcode) instruction, 5-17 


reads IKSP register, 2-4 
reads kernel stack, 2-9 


rdmces (PALcode) instruction, 5—18 

rdpcr (PALcode) instruction, 5-19 
reads PCR register, 2~5 

rdpsr (PALcode) instruction, 5-20 

rdstate (PALcode) instruction, 5—21 

rdteb (PALcode) instruction, 5-53 
reads TEB register, 2-6 

rdthread (PALcode) instruction, 5—22 
reads THREAD register, 2-7 

reboot (PALcode) instruction, 5—23 


operation of, 6-3 
tasks and sequence for, 6—5 


Register mask, floating-point and integer, 4—5 
Registers, Windows NT Alpha usage of, 1-3 
restart (PALcode) instruction, 5-24 

tasks and sequence for, 6—5 
Restart block, with catastrophic errors, 4—19 
Restart block pointer, 2—7, 6-3 
Restart execution address (RESTART_ADDRESS) 

register, 2-6 

at PALcode exit, 5-1 
retsys (PALcode) instruction, 5—25 

use of, 4—2 
rfe (PALcode) instruction, 5-27 


compared to retsys, 5—25 
use of, 4-2 


S 


SCE bit, machine check error summary register, 
4-18 








Software completion bit, exception summary register, 


4-6 
Software exceptions, 4-8 ~ 


Software interrupt request (SIRR) register, 2—6 
clearing, 5-4 
format for, 4—15 
See also Software interrupts 

Software interrupts 
requesting, 4—15 
requests after exception handling, 5-25, 5-27 
setting, 5-29 
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ssir (PALcode) instruction, 5-29 

sets software interrupts, 4-16 
STATUS_ALPHA_ARITHMETIC code, 4—5 
STATUS_ALPHA_GENTRAP code, 4-8 
STATUS_BREAKPOINT code, 4-9 
STATUS_DATATYPE_MISALIGNMENT code, 

4-7 

STATUS_ILLEGAL_INSTRUCTION code, 4-7 
STATUS_INVALID_ADDRESS code, 4—7 
Superpage address space, 3-1 
SWC bit 

exception summary register, 4—6 
swpctx (PALcode) instruction, 5-30 

PDR register with, 2-6 

writes IKSP register, 2-4 


writes TEB register, 2-6 
writes THREAD register, 2-7 


swpirql (PALcode) instruction, 5-32 
as synchronization function, 4—15 
swpksp (PALcode) instruction, 5-33 


reads kernel stack, 2—9 
writes IKSP register, 2-4 


swppal (PALcode) instruction, 5-34, 6-6 
firmware contributes, 1—2 

swpprocess (PALcode) instruction, 5—35 
writes PDR register, 2-6 

Synchronization levels, interrupt, 4-13 

System correctable errors, 4—16 
reporting, 4-18 

System service call exceptions, 4—4 
returning from, 5-25 

System service exception address 

(SYSCALL_ENTRY) register, 2-6 


System uncorrectable errors, 4-17 


T 


tbia (PALcode) instruction, 3-6, 5-36 
tbim (PALcode) instruction, 3-6, 5—37 
tbimasn (PALcode) instruction, 3-6, 5—38 
tbis (PALcode) instruction, 3-6, 5-39 
tbisasn (PALcode) instruction, 3-6, 5—40 
Temporary PALcode registers, 5-1 


Thread environment block base (TEB) register, 2-6 
initializing, 5—12 
returning contents of, 5—53 
with context switch, 2—10, 5-31 

Thread unique value (THREAD) register, 2-7 
initializing, 5-12 
returning contents of, 5-22 
with context switch, 2-10, 5-31 
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Timer support, HAL interface fpr, 1-3 


Translation buffer (TB) 


at context switch, 2-10 
invalidate all, 5-36 
invalidate multiple, 5-37 
invalidate single, 5—39 
invalidate single data, 5-8 
management of, 3-5 
recursion in, 3-6 


Translation not valid fault Faults, 4-3 
Trap frames and offsets, 4-3 
TRAP_CAUSE_UNKNOWN code, 4-11 


TrFir trap frame offset 
from ExceptionPC address, 4—5 


U 


Unaligned access exceptions, 4—7 

Underflow bit, exception summary register, 4—6 
Underflow trap, 4-6 

UNF bit, exception summary register, 4—6 

User stack, 2-9 

USER_BREAKPOINT breakpoint type, 4-9 


V 
Valid (V), bit in PTE, 3-5 


Virtual address space, 3-1 
Virtual address translation, 3-3 


Virtual addresses 
format of, 3-2 
non-canonical at fault, 4—7 
physical view of, 3-3 
virtual view of, 3-3 
Virtual cache blocks . 


tervrnliAdnting all £24 

LUV ALIUALL all, YTYU 
invalidating multiple, 5-37 
invalidating single, 5-39 


W 


wrentry (PALcode) instruction, 5-41 
at initialization, 6-2 
writes GENERAL_ENTRY register, 2-4 
writes INTERRUPT_ENTRY register, 2—4 
writes MEM_MGMT_ENTRY register, 2—5 
writes PANIC_ENTRY register, 2-5 
writes SYSCALL_ENTRY register, 2-6 


/OAT 


wrmcees (PALcode) instruction, 5—43 


wrperfmon (PALcode) instruction, 5—44 
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Console Interface Architecture (IIT) 


This part describes an architected console interface and contains the following chapters: 


¢ Chapter 1, Console Subsystem Overview (IID) 
¢ Chapter 2, Console Interface to Operating System Software (III) 
¢ Chapter 3, System Bootstrapping (III) 
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Chapter 1 


Console Subsystem Overview (IIT) 


On an Alpha system, underlying control of the system platform hardware is provided by a con- 
sole!. The console: i 


e Initializes, tests, and prepares the system platform hardware for Alpha system software. 
e Bootstraps (loads into memory and starts the execution of) system software. 


¢ Controls and monitors the state and state transitions of each processor in a multiproces- 
sor system in the absence of operating system control. 


e Provides services to system software that simplify system software control of and 
access to platform hardware. 


e Provides a means for a "console operator" to monitor and control the system. 


The console interacts with system platform hardware to accomplish the first three tasks. The 
mechanisms of these interactions are specific to the platform hardware; however, the net 
effects are common to all systems. Chapter 3 describes these functions. 


The console interacts with system software once control of the system platform hardware has 
been transferred to that software. Chapter 2 discusses the basic functions of a console and its 
interaction with Alpha system software. 


The console interacts with the console operator through a virtual display device or console ter- 
minal. The console operator may be a person or a management application. The console 
terminal forms the interface between the console and a console presentation layer. The func- 
tions of that presentation layer and the display formats are described in Section 1.3. 


An Alpha multiprocessor system has one primary processor and one or more secondary proces- 
sors. The primary processor: 

e¢ Can legally refer to the console I/O devices 

¢ Can legally send characters to the console terminal 

¢ Can legally receive characters from the console terminal 

e Has direct access to a BB_WATCH on the system 

e Is named in response to an inquiry as to which processor is primary 


All other processors in the system are secondary processors. 


1 \A term shrouded in the antiquity of computing. So named because this mechanism was first 
realized as a desk-like panel of switches and blinking lights.\ 
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1.1 Console Implementations 


The implementation of an Alpha console varies from system to system. Regardless of imple- 
mentation, the console on each system provides the functionality described in this chapter and 
in Chapters 2 and 3. The console may be implemented as: 


¢ "Embedded," or co-resident in the hardware platform complex that contains the proces- 
sors 


¢ "Detached," or resident on a separate hardware platform 
e =6Any hybrid of the above 


The distinction is somewhat arbitrary. A detached console may have cooperating special code 
that executes on one of the processors; an embedded console may have a cooperating manage- 
ment application that executes on a remote machine. 


Regardless of the actual implementation, each console must provide: 


e A wirtial dic pla au davira tha defanit "AAnanla tarminal !! 
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This device allows the console operator to issue commands and receive displays. With 
no hardware errors and with the proper console-lock setting, the default console 
terminal device provides reliable communication with the rest of the console. 


¢ Reliable access to console functionality by system software and the console operator. 


All console functionality must appear to reside within the console at all times. All 
console functions must be accessible in a timely manner, without prior notification, 
and reliably. 


e Secure communications with system software and the console operator. 


All console communication paths must be able to be made secure by either physical 
measures or encryption methods. 


¢ A mechanism by which the console can gain control of a processor that is executing 
system software. 


This mechanism must preserve the execution state of system software; it must be 


t tral af th na h 
possible for the console to gain control of the processor and subsequently continuc 


system software execution successfully. 
e A mechanism that locks the console. 


A console lock prohibits the user from accessing a selected subset (or all) of console 
functions. The console lock may be a console password, a key switch, jumper, or any 
other implementation-specific mechanism. The lock is either "locked" or "unlocked." 


1.2 Console Implementation Registry 
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This chapter, and Chapters 2 and 3, specify required console functions. Some of these func 
tions have attributes that may vary with console implementation; consoles may ie provide 
more than the required functions. Console functions or attributes that may vary with implemen- 


tation include: 
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Supported console terminal blocks (CTBs) 

Supported environment variables 

Environment variable value formats, such as BOOT_DEV or BOOT_OSFLAGS 
Configuration data block format 

Supported callback routines 

Supported bootstrap media 


Implementation-specific HALT codes or messages 


The goal of the Alpha console architecture is to promote a consistent interface across all Alpha 
systems. Some console functionality is inherently implementation specific and cannot be 
required of all Alpha systems; some may be applicable to more than one Alpha system. To pre- 
vent the proliferation of interfaces and achieve commonality of function whenever possible, 
the Alpha console architecture requires that: 


Any console function that is visible to system software and is not specified by these 
chapters must be registered with the Alpha architecture group. 


Any console function that is visible to an on-site or remote console operator (including 
Field Service engineers) and is not specified by these chapters must be registered with 
the Alpha architecture group. 


Whenever possible, implementations must use previously registered functions rather 
than inventing new variations. 


Console functions intended for use solely by development engineering or expert-level repair 
and diagnosis are excluded from these requirements. 


1.3 Console Presentation Layer 


The following functions are assumed to be provided in the console presentation layer: 


BOOT (bootstrap the system) 

CONTINUE (continue execution) 

START-CPU (start a given secondary) 

INITIALIZE (initialize system) 

INITIALIZE-CPU (initialize a given processor) 
HALT-CPU (force a given processor into console I/O mode) 


HALT-CRASH (cause a given processor to initiate a crash) 


1.4 Messages 


The console generates a binary message code to the console presentation layer to signal mes- 
sages, such as audit trail or error messages. The console presentation layer interprets the 
binary code into something that is meaningful to the console operator. 
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1.5 Security 


The means by which the console achieves a secure communications path with system software 
and with the console operator is implementation specific. Embedded consoles have the built-in 
capability of secure communications with system software. Detached consoles can achieve 
this security by residing in the same room as the Alpha system and communicating with it 
over a private connection. Detached consoles can also achieve security by using an encrypted 
protocol over a shared connection. This latter method allows a workstation over a network to 
function as the console. 


1.6 Internationalization 


Wherever possible, console implementations should support the goals of internationalization: 


¢ Each message has a binary message code. The console presentation layer interprets the 
code into a meaningful message display of the appropriate language and characters. 


Consoles should avoid cxplicitly intcrpreting Character set eiicodiiig (such as [ISO 
Latin—1). Character strings are to be viewed as simple byte strings. Thus, the GETC 
console callback routine supports from one-to-four-byte character encodings, depend- 
ing on the currently selected language and character set; the PUTS routine outputs only 


a byte stream. 


e ASCII strings are used in certain fields of the HWRPB and certain interprocessor com- 
munications due to DEC Standard 12 and to present a common interface to system soft- 
ware. 


e The currently selected character set encoding and language to be used for the console 
terminal are defined by the CHAR_SET and LANGUAGE environment variables. 


e ©The end of a character string passed between the console and the operating system as an 
argument to a console callback routine is determined by passing its length. 


¢ Console callback routines should be written to be independent from character set 
encoding and language. At a minimum, every implementation must support ISO Latin— 
1 character set encodings, which requires the following properties: 


1. The GETC console caliback routine returns a one byte character (see Section 


2.3.4). 


2. The PROCESS_KEYCODE console callback routine returns a one-byte character 
(see Section 2.3.4). 


3. English console presentation layers are strongly encouraged to use the actual values 
as defined in Table 2-7, rather than creating aliases. 


Other supported character set encodings are determined by platform product 
requirements. 


e The console presentation layer is independent of the required console functionality 
interface. | 
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1.7 Documentation Note 


The chapters in Section III apply to both OpenVMS Alpha and DIGITAL UNIX operating sys- 
tems. The few functional descriptions that are unique to one operating system are described as 
such. However, because of contextual equivalence in this section and in the interests of brev- 
ity, any text concerning the OpenVMS Alpha hardware privileged context block (HWPCB) 
applies equally to the DIGITAL UNIX privileged context block (PCB). Equivalent informa- 
tion for Windows NT Alpha is located in Windows NT Alpha Software (II-C), Chapter 6. 
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1.8 \Revision History 


Revision 7.0, November 10, 1997 
1. Alpha AXP ——> Alpha 
2. Digital —-> DIGITAL 
3. DEC OSF/1 —> DIGITAL UNIX 
4. Windows NT AXP —> Windows NT Alpha 
5 


Pointer to Windows NT Alpha section, Chapter 6 for firmware info 


Revision 6.0, December 12, 1994 
1. "Operator Interface" removed from title 
Audit trail error messages removed and placed in Appendix E 
Console lock mechanisms section (was TBD) removed 


2 
3 
4. ISO Latin—1 support section rolled into Internationalization section 
5. Alpha ——> Alpha AXP 


Revision 5.0, May 12, 1992 
1. Reorganized according to SRM Rev 5 requirements 
2. Converted to SDML 
3. Replace previous Console Chapter with Console ECO #15 
4. Includes 3 chapters and two appendices, renumber I/O Chapter 
5 


Material substantially changed or rearranged\ 
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Chapter 2 


Console Interface to Operating System Software (IIT) 


This chapter describes the interactions between the console subsystem and system software. 
These services depend on state that is shared between the console and system software. Shared 
state is contained in the Hardware Restart Parameter Block (HWRPB) and a number of envi- 
ronment variables. The HWRPB is a data structure that is directly accessed by both the 
console and system software; the environment variables are indirectly accessed by system soft- 
ware. Specifically: 


e Section 2.1 describes the HWRPB. 
e Section 2.2 describes the environment variables. 


e Section 2.3 describes the service, or callback, routines provided by the console to sys- 
tem software. 


e Section 2.4 describes the communication between the console and system software. 


2.1 Hardware Restart Parameter Block (HWRPB) 


The Hardware Restart Parameter Block (HWRPB) is a page-aligned data structure that is 
shared between the console and system software. The HWRPB is a critical resource during 
bootstraps, powerfail recoveries, and other restart situations. An overview of the HWRPB is 
shown in Figure 2—1. The individual HWRPB fields are shown in Figure 2—2 and described in 
Table 2-1. 


The console creates the HWRPB and the required per-CPU, CTB, CRB, MEMDSC, and 

DSRDB offset blocks as a physically contiguous structure during console initialization. Fields 

within the HWRPB and the required offset blocks are updated by the console and system soft- 

ware during and after system bootstrapping. The console must be able to locate the HWRPB 

and the required offset blocks at all times. Neither the console nor system software may move 

the HWRPB or the required offset blocks to different physical memory locations; subsequent 
' Operation of the system is UNDEFINED if such an attempt is made. 
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Figure 2-1: HWRPB Overview 


HWRPB 






General Information 










TRB Offset 
Per-CPU Offset 
CTB Table Offset 
CRB Offset 
MEMDSC Offset 
CONFIG Offset 
FRU Table Offset 
(Restart Routine Linkage Pair) 
DSRDB Offset 













CPU Restart Routine 








Translation Buffer 
Hint Block (TRB) 








Per-CPU Slots 


| PALcode Spaces _, 
CPU Logout Areas 


PALcode Pointers 











Logout Area Pointers 





Console Terminal Block 
(CTB) Table 


Console Routine Block 
(CRB) 


CRB Map Entries 















CRB Pages 






Memory Data 
Descriptor Table 


Cluster # 1 Bitmap 
| Cluster # n Bitmap 


Register # 1 Bitmap Pointer 











Register # n Bitmap Pointer 


Optional Configuration 
Data Block (CONFIG) 


Optional Field Replaceable 
Unit Table (FRU) 






Dynamic System Recognition 
Data Block (DSRDB) 





DIGITAL Restricted Distribution 


2-2 Console Interface Architecture (Ill) 


The HWRPB and the required offset blocks must comprise a virtually contiguous structure at 
all times. Before transferring control to system software, the console maps the HWRPB and 
the required offset blocks into contiguous addresses beginning at virtual address 0000 0000 
1000 0000,¢ in the initial bootstrap address space. If system software subsequently changes 
this virtual mapping, any new mapping must preserve the relative offsets of all fields and 
blocks; all physically contiguous pages must remain virtually contiguous. Some of the data 
structures located by HWRPB fields need not be contiguous with the HWRPB. The structures 
that may be discontiguous are the PALcode spaces, the logout areas, the CRB pages, and the 
memory bitmaps located by the MEMDSC table. 


All offset blocks must be at least quadword aligned. The starting address of an offset block is 
determined by adding the contents of the HWRPB offset field to the starting address of the 
HWRPB. For example, the starting address of the MEMDSC block is given by: 


MEMDSC Address = HWRPB address + MEMDSC OFFSET 
= HWRPB address + (HWRPB[200]) 


The total size of the HWRPB and the required offset blocks is on the order of 8K bytes to 16K 
bytes. The size is contained in the HWRPB_SIZE field at HWRPB[24]. The required offset 
blocks may be offset from the HWRPB in any order; the HWRPB offset fields must not be 
used to infer the size of the HWRPB or any offset block. 
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Figure 2-2: Hardware Restart Parameter Block Structure 
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Figure 2-2: Hardware Restart Parameter Block Structure (Continued) 


63 


0 
Virtual Address of CPU Restart Routine 1+256 
Procedure Value of CPU Restart Routine 1+264 


Reserved for System Software +272 


Reserved for Hardware :+280 


RXRDY Bitmask :+296 
TXRDY Bitmask 7+304 


Offset to Dynamic System Recognition Data Block Table +312 


| | :+(HWRPB[136}) 
i Translation Buffer Hint Block | : 
| | :-+(HWRPB[160]) 
i | Per-Processor Slots | : 
| | :+(HWRPB[184]) 
i Console Terminal Block | ’ 
| | :+(HWRPB[192]) 
i Console Callback Routine Block | : 
| | :++(HWRPB[200]) 
i Memory Data Descriptor Table | y 
| | :+(HWRPB[208)]) 
i Optional Configuration Data Block i 
| . | :+(HWRPB[216]) 
i Optional Field Replaceable Unit Table | 


Dynamic System Recognition Data Block | 





-+(HWRPB[312]) 


DIGITAL Restricted Distribution 


Console Interface to Operating System Software (III) 2-5 


Table 2-1: HWRPB Fields 


Offset 
HWRPB 


+08 


+16 


+24 


+32 
+40 


+48 


Description. 


HWRPB PA! 


Starting physical address of the HWRPB field. This field is used by the 
console to validate the HWRPB. 


HWRPB VALIDATION! 
Quadword containing "HWRPB<0><0><0>" (0000 0042 5052 
574816). This field is used by the console to validate the HWRPB. 


HWRPB REVISION! 
Format of the HWRPB. See Section 2.1.1. The HWRPB revision level 
for this version of the architecture specification is 7.\ 


Version Interpretation 

0 Reserved 

1 Revision 1.1-2.1 (ADU only) 
2 Revision 3.0 

3 Revision 3.3 (ECO #30) 

4 Revision 4 (ECO #39) 

5 Revision 5 (ECO #56) 

6 Revision 6 (ECO #71) 

7 Revision 7 (ECO #112) 
Other Reserved for future use\ 


HWRPB SIZE! 

Size in bytes of the HWRPB and required physically contiguous TBB, 
per-CPU, CTB, CRB, MEMDSC, CONFIG, FRU, and DSRDB offset 
blocks. Unsigned field. 


PRIMARY CPU ID!” 
WHAM I of the primary processor. System software modifies this field 
only at primary switch; see Section 3.5.6. Unsigned field. 


PAGE SIZE! 
Number of bytes within a page for this Alpha processor implementa- 
tion. Unsigned field. 


PA SIZE! 
Size of the physical address space in bits for this Alpha processor 
implementation. PA SIZE must be 48 bits or less. Unsigned 32-bit 


field. 


DIGITAL Restricted Distribution 


2-6 Console Interface Architecture (III) 


Table 2-1: HWRPB Fields (Continued) 


Offset Description 


+52 EXTENDED VA SIZE? 
Size of the extended portion of the virtual address space in bits for this 
Alpha processor implementation. Unsigned 32-bit field. 


If this implementation is operating with three levels of page table, this 
field contains a zero, indicating that no "extension" bits exist. 


If this implementation is operating with a fourth level of page table, 
this field contains the number of additional virtual address bits that 
exist beyond the number that would exist if operating with only three 
levels of page table. 


+56 - MAX VALID ASN! 
Maximum ASN value allowed by this Alpha processor implementa- 
tion. Unsigned field. 


+64 SYSTEM SERIAL NUMBER! | 
Full DEC STD 12 serial number for this Alpha system. This octaword 
field contains a 10-character ASCII serial number determined at the 
time of manufacture; see DEC STD 12 for format information. See 
Section 2.1.1.1. 


+80 SYSTEM TYPE! : 
Family or system hardware platform. See Section 2.1.1. Unsigned 
field. 

+88 SYSTEM VARIATION! 


Subtype variation of the system. This may include the member of the 
system family and whether the system has optional features such as 
multiprocessor support or special power supply conditioning. See Sec- 
tions 2.1.1 and 2.1.1.2 for optional features. 


+96 SYSTEM REVISION CODE! 
DEC STD 12 revision field for this Alpha system. Four ASCII charac- 
ters. See Section 2.1.1.1. 


+104 INTERVAL CLOCK INTERRUPT FREQUENCY! 
Number of interval clock interrupts per second (scaled by 4096) in this 
Alpha system. Interrupts occur only if enabled. Unsigned field. 


+112 CYCLE COUNTER FREQUENCY! 
Number of SCC and PCC updates per second in this Alpha system. See 
the RPCC instruction and, for OpenVMS Alpha, the CALL_PAL 
RSCC instruction. Unsigned field. 
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Table 2-1: HWRPB Fields (Continued) 


Offset 
+120 


+128 
+136 


+144 


+152 


+160 


+168 


+176 


+184 


+192 


+200 


Description 


VIRTUAL PAGE TABLE BASE”? 

Virtual address of the base of the entire page table structure. The con- 
sole sets this field at system bootstraps and restores the virtual page 
table base register (pointer) with this value at all processor restarts. 
System software is responsible for updating this field whenever the vir- 
tual page table base register (pointer) is modified. See Sections 3.4.1.3, 
3.4.3.5, and 3.5.1. 


Reserved for architecture use; SBZ. 


TB HINT OFFSET! 
Unsigned offset to the starting address of the Translation Buffer Hint 
Block (TBB). See Section 2.1.2. 


1 


ATTTAANDMN AMNMYM MATT AY mama 
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Number of per-CPU slots present. Must be a number between | and 64, 


~ inclusive. See Section 2.1.3 for the per-CPU slot format. Unsigned 


field. 


PER-CPU SLOT SIZE! 
Size in bytes of each per-CPU slot rounded up to the next integer mul- 
tiple of 128. See Section 2.1.3. Unsigned field. 


CPU SLOT OFFSET! 
Unsigned offset to the first per-CPU slot in the HWRPB. See Section 
Pe OF 


NUMBER OF CTB! 
Number of Console Terminal Blocks (CTBs) contained in the CTB 
table. See Section 2.3.8.2. Unsigned field. 


CTB SIZE! 
Size in bytes of the largest Console Terminal Block (CTB) contained 
in the CTB table. See Section 2.3.8.2. Unsigned field. 


CTB OFFSET! 
Unsigned offset to the starting address of the Console Terminal Block 
(CTB) table. See Section 2.3.8.2. 


CRB OFFSET! 
Unsigned offset to the starting address of the Console Callback Rou- 
tine Block (CRB). See Section 2.3.8.1. . 


MEMDSC OFFSET! 
Unsigned offset to the starting address of the Memory Daia Descriptor 


SS 
Table (MEMDSC). See Section 3.4.1.2. 
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Table 2-1: HWRPB Fields (Continued) 
Offset Description 


+208 CONFIG OFFSET! 
Unsigned offset to the starting address of the Configuration Data Table 
(CONFIG). If zero, no CONFIG table exists. See Section 2.1.4. 


+216 FRU TABLE OFFSET! 
Unsigned offset to the starting address of the Field Replaceable Unit 
Table (FRU). If zero, no FRU table exists. See Section 2.1.5. 


+224 SAVE_TERM RTN VA?" 
Starting virtual address of a routine that saves console terminal state. 
This routine is optionally provided by system software. See Section 
3.5.7. Set to zero by the console at system bootstraps. 


+232 SAVE_TERM VALUE”? 
Procedure value of the SAVE_TERM routine optionally provided by 
system software. The console copies this value into R27 before invok- 
ing the routine. See Section 3.5.7. Set to zero by the console at system 
bootstraps. 


+240 RESTORE_TERM RTN VA?" | 
Starting virtual address of a routine that restores console terminal state. 
This routine is optionally provided by system software. See Section 
3.5.7. Set to zero by the console at system bootstraps. 


+248 RESTORE_TERM VALUE”? 
Procedure value of the RESTORE_TERM routine optionally provided 
by system software. The console copies this value into R27 before 
invoking the routine. See Section 3.5.7. Set to zero by the console at 
system bootstraps. 


+256 RESTART RTN VA? 
Starting virtual address of a CPU restart routine provided by system 
software. The console restarts system software by transferring control 
to this routine. See Section 3.5. Set to zero by the console at system 
bootstraps. 


+264 RESTART VALUE?” 
Procedure value of the CPU restart routine provided by system soft- 
ware. During the restart process, the console copies this value into R27 
before transferring control to the CPU restart routine. See Section 3.5. 
Set to zero by the console at system bootstraps. 


+272 RESERVED FOR SYSTEM SOFTWARE”? 
Reserved for use by system software. Set to zero by the console at sys- 
tem bootstraps. 


+280 RESERVED FOR HARDWARE! 
Reserved for use by hardware. 
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Table 2-1: HWRPB Fields (Continued) 
Offset Description 


+288 HWRPB CHECKSUM? 
Checksum of all the quadwords of the HWRPB from offset [00] to 
[280], inclusive. Computed as a 64-bit sum, ignoring overflows. Used 
to validate the HWRPB during warm bootstraps, restarts, and second- 
ary starts. Set by console initialization; recomputed and updated when- 
ever a HWRPB field with offset [00] to [280], inclusive, is modified by 
the console or system software. 


+296 RXRDY BITMASK”*? 
Secondary receive bitmask for interprocessor console communications. 
When transmitting a command to a secondary, the primary processor 
sets the RXRDY bit, which corresponds to the CPU ID of the second- 
any The number of active bits in this field is determined by the number 


of per-CPU slots in ITWRPB[i44]. See Section Z.4. Ali bits are initial- 
ized as clear. 


+304 TXRDY BITMASK?* 
Secondary transmit bitmask for interprocessor console communica- 
tions. When transmitting a message to the primary, the secondary pro- 
cessor sets the TXRDY bit, which corresponds to its CPU ID and 
requests an interprocessor interrupt to the primary. The number of 
active bits in this field is determined by the number of per-CPU slots in 
HWRPB[144]. See Section 2.4. All bits are initialized as clear. 


+312 DSRDB OFFSET! 
Unsigned offset to the starting address of the Dynamic System Recog: 
nition Data Block. 


+(HWRPB[136]) TB HINT BLOCK? 
Quadword-aligned block that describes the characteristics of the trans- 
lation buffer (TB) granularity hints. See Section 2.1.2. 


+CHWRPB[IO0]) — Per-CPU SLOTS”? 
128 byte-aligned slots that describe each processor in the system. See 
Section 2.1.3. 


+(HWRPB[184]) CTB TABLE! 


Quadword-aligned Console Terminal Block Table. Set at console ini- 
tialization; modified by console terminal callbacks. See Section 
2.30.2: 


+(HWRPB[192]) = CONSOLE CALLBACK ROUTINE BLOCK?” | 
Quadword-aligned block that describes the location and mapping of 
the console callback routines. Set at system bootstrap; modified by 
console FIXUP callback. See Section 2.3.8.1. 
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Table 2-1: HWRPB Fields (Continued) 


Offset Description 
+(HWRPB[200]) wmeEmMpsc!3 


Quadword-aligned Memory Data Descriptor Table. Set at console ini- 
tialization; preserved across warm bootstraps. See Section 3.4.1.2. 


+(HWRPB[208]) = CONFIG BLOCK! 
Optional implementation-dependent configuration block. See Section 
2.1.4. 


+(HWRPB[216]) BRU TABLE! 


Optional implementation-dependent field replaceable unit table. See 
Section 2.1.5. 


+(HWRPB[312])  psrpp! 


Quadword-aligned Dynamic System Recognition Data Block 
(DSRDB). \Data in the DSRDB uniquely identifies each platform; con- 
tains a unique identifier, system name, and license units requirements 
table. This field is valid for HWRPB revisions 5 and greater. See Sec- 
tion 2.1.6.\ 


Initialized by the console at cold system bootstrap only. Preserved unchanged by the console 
at all warm system bootstraps. 


Initialized by the console at all system bootstraps (cold or warm). 


May be modified by system software. 


2.1.1 Serial Number, Revision, Type, and Variation Fields 


The HWRPB contains several serial number, revision, type, and variation fields that describe 
the Alpha system platform hardware and PALcode. System software uses these fields to iden- 
tify hardware-dependent support code that must be loaded or enabled. These fields are 
examined early in operating system bootstrap; if one of the fields contains a value that is unrec- 
ognized or incompatible with the operating system, the bootstrap attempt fails. Diagnostic 
software uses these fields to guide field installation and upgrade procedures and for material 
and parts control. | | 


In multiprocessor systems, the processor type and PALcode revisions need not be identical for 
all processors. Console and system software can use these fields to determine if multiproces- 
sor operation is viable. This evaluation may be performed by the running primary, the starting 
secondary, or a combination of both. For an example, see Section 3.4.3.3. 


2.1.1.1 Serial Number and Revision Fields 
The revision fields include: 
© HWRPEB revision — HWRPB[16] 


This field identifies the format of the HWRPB. Since the HWRPB is shared between 
the console and system software, both must agree on the field offsets, formats, and 
interpretations. 
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e System serial number and revision — HWRPB[64] and HWRPB[96] 


These fields identify the system platform hardware serial number and revision 
according to DEC STD 12. 


The system serial number and revision fields must be distinct from the processor serial 
number and revision fields in the per-CPU table, pointed to by HWRPB[152]. In 
particular, on multiprocessing systems, the system fields must not be simply replicated 
from the fields of the primary processor. The system fields must be constant 
regardless of which processor serves as primary and must have persistence across 
processor failures and/or replacement. 


e Processor type and processor variation (capabilities) — SLOT[176] and SLOT[184] 


These per-CPU slot fields identify each Alpha processor and its capabilities. The type 
field (SLOT[176]) contains a major and minor subfield. The major subfield identifies 
the processor family and the minor subfield identifies the particular membership in 
that family. 


The variation (capabilities) field (SLOT[184]) identifies any system-specific attributes 
(such as local memory or cache size). 


Processor type and variation field assignments are listed in Appendix D. 
e Processor Revision — SLOT[192] 


This per-CPU slot field identifies the processor hardware revision according to DEC 
STD 12. 


e = PALcode Revision — SLOT[168] 


This field identifies the PALcode revision required and/or in use by the processor. 
System software uses the PALcode variation and PALcode compatibility subfields. 
The variation subfield indicates whether the PALcode image includes extensions or 
functional variations necessary to a given operating system or application. 


Programming Note: 
For example, a PALcode variation may contain a different TB fill routine. System 
software (and optionally the console) uses the compatibility subfield to ensure that 
all processors in a multiprocessor system are using compatible PALcode images. 


PALcode revisions are specific to the system platform and processor major type. The 
file name of distributed PALcode images must contain sufficient information to 
distinguish the intended system platform and processor. 


e PALcode Revisions Available — SLOT[464] 


This field identifies the PALcode variant revisions that have been previously loaded 
on this processor. System software uses these fields to determine if a given PALcode 
variant and revision are present before PALcode switching. The format follows the 
PALcode revision field in SLOT[168]. . 


PALcode variation assignments are listed in Appendix D. 
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2.1.1.2 System Type and Variation Fields 
The system type and system variation fields are HWRPB[80] and HWRPB[88]. 


These fields identify the Alpha system platform. System software infers attributes such as 
physical address offsets and I/O device locations from the system type. The system type field 
contains the family and member identification numbers, along with the major and minor sub- 
field identifiers. It is described in Appendix D. The system variation field is described in Table 


2-2. 


The following system variations are defined: 


Table 2—2: System Variation Field (HWRPB[88]) 


Bits Description 


63 — 16 Reserved — MBZ 


15-10 System Type Specific (STS). Registered system identifiers for system member 


identification. 

a GRAPHICS — If set, indicates that the platform contains an embedded graph- 
ics processor. Initialized by the console at all cold bootstraps. 

8 POWERFAIL RESTART — If set, indicates that the console should restart all 


available processors on a powerfail recovery. If clear, only the primary proces- 
sor will be restarted. Cleared by the console at system bootstraps; may be set by 
system software. 


7-5 POWERFAIL — Indicates the type of powerfail Gif any) implemented by this 
platform. See Section 3.5.3 for more information. Defined values include: 


<7:5> 


000 
001 
010 
011 


Interpretation 

Reserved 

United 

Separate 

Full battery backup of system platform hardware 


Initialized by the console at all cold bootstraps. 


4-1 CONSOLE — Indicates the type of console. Defined values include: 
<4:1> Interpretation 
0000 Reserved 
0001 Detached service processor 
0010 Embedded console 
Other Reserved for future use 


Initialized by the console at all cold bootstraps. 
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Table 2-2: System Variation Field (HWRPB[88]) (Continued) 


Bits Description 


0 MPCAP — If set, indicates this system platform is capable of being configured 
as a multiprocessor; all support for multiprocessing is present, even if only one 
processor is present. If clear, this system supports a uniprocessor only. Initial- 
ized by the console at all cold bootstraps. 


2.1.2 Translation Buffer Hint Block 


The Translation Buffer Hint Block (TBB) contains information on the characteristics of the 
instruction stream translation buffer (ITB) and data stream translation buffer (DTB) granular- 
ity hints (GH). All processors in a multiprocessor Alpha system must implement the same 
granularity hints. The granularity hint fields are listed in Table 2-3. 

The TBD consists Of 8 Guadwords, 4 for each of the transiation buffers (ITB and DB). The 4 
quadwords contain 16 word fields; each word contains the number of entries in the translation 
buffer that implement a combination of granularity hints (including none). 


Table 2-3: Granularity Hint Fields 


Offset, Granularity Hint 


0 None 

2 1 page 

4 8 pages 

6 1 and 8 pages 

8 64 pages 

A 1 and 64 pages 

C 8 and 64 pages 

E 1, 8, and 64 pages 
10 512 pages 

12 1 and 512 pages 

14 8, and 512 pages 

16 1, 8, and 512 pages 
18 64 and 512 pages 
1A 1, 64, and 512 pages 
iC 8, 64, and 512 pages 
1E 1, 8, 64, and 512 pages 
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2.1.3 Per-CPU Slots in the HWRPB 


Information on the state of a processor is contained in a "per-CPU slot" data structure for that 
processor. The per-CPU slots form a contiguous array indexed by CPU ID. The starting 
address of the first per-CPU slot is given by the offset HWRPB[160] relative to the starting 
address of the HWRPB. The number of per-CPU slots is given in HWRPB[144]. Each 
per-CPU slot must be 128-byte-aligned to ensure natural alignment of the hardware privileged 
context block (HWPCB) at SLOT[0]. The slot size, rounded up to the nearest multiple of 128 
bytes, is given in HWRPB[152]. 


CPU IDs are determined by the implementation.. The only requirement is that they be in the 
range of zero to the maximum number of processors the particular platform supports minus 
one. 


Software Note: 
OpenVMS Alpha supports CPU IDs in the range 0-31 only. 


Each per-CPU slot contains information necessary to bootstrap, start, restart or halt the proces- 
sor. The format is shown Figure 2—3 and Table 2-4. The hardware privileged context block 
(HWPCB) specifies the context in which the loaded system software will execute. 


The console must initialize the per-CPU slot for the primary processor before system boot- 
strap. The per-CPU slot fields for secondary processors are set by a combination of the 
console and system software. The console updates the halt information at error halts and 
before processor restarts. 


Slots corresponding to nonexistent processors are zeroed. There may be more per-CPU slots 
than are necessary in any given Alpha system. A system implementation may reserve HWRPB 
space for processors that are not present at system bootstrap. 


An Alpha system may support internally different, yet software compatible, PALcode for dif- 
ferent processors in a multiprocessor implementation. Each per-CPU slot contains a PALcode 
memory descriptor that locates the PALcode used by that processor. See Section 3.3.1 for 
information on PALcode loading and initialization on the primary processor and Section 
3.4.3.3 for information on PALcode loading and initialization on secondary processors. 


The starting address of a per-CPU slot is calculated by: 


Slot Address 


{CPU ID * slot size} + offset + HWRPB base 
{CPU ID * HWRPB[152]} + HWRPB[160] + #HWRPB 


The address may be physical or virtual. 
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Figure 2-3: Per-CPU Slot in HWRPB 
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Table 2—4: Per-CPU Slot Fields 


Offset 
SLOT 


+128 


+136 


+144 


+152 


+160 


+168 


Description 


HWPCB!? 
Hardware privileged context block (HWPCB) for this processor. See Table 3-8 for 
the contents as set by the console. 


STATE FLAGS! 
Current state of this processor. See Table 2-5 for the interpretation of each bit. 


PALCODE MEMORY SPACE LENGTH** 
Number of bytes required by this processor for PALcode memory. Unsigned field. 


PALCODE SCRATCH SPACE LENGTH?**” 
Number of bytes required by this processor for PALcode scratch space. Unsigned 
field. 


PA OF PALCODE MEMORY SPACE??> 
Starting physical address of PALcode memory space for this processor. PALcode 
memory space must be page aligned. See Section 3.3.1 or Section 3.4.3.3. 


PA OF PALCODE SCRATCH SPACE”? | 
Starting physical address of PALcode scratch space for this processor. PALcode 


_ scratch space must be page aligned. See Section 3.3.1 or Section 3.4.3.3. 


PALCODE REVISION??+© 
PALcode revision level for this processor: 


Bits Interpretation 


63-48 | Maximum number of processors that can share this PALcode image 


47-32 PALcode compatibility (0Q-65535): 
0 Unknown 
1-65535 Compatibility revision 
31-24 SBZ 
23-16  PALcode variation (0-255) 
15-8 PALcode major revision (0-255) 
7-0 PALcode minor revision (0-255) 


This field identifies the PALcode revision required by the console and/or processor 
initialization. The major and minor PALcode revisions are set at console initializa- 
tion; the remaining fields are set during PALcode loading and initialization. This 
field must be updated after PALcode switching to reflect the new PALcode envi- 
ronment. See Sections 2.1.1 and Section 3.4.3.3. Also see Appendix D. 
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Table 2-4: Per-CPU Slot Fields (Continued) 


Offset. 
+176 


+184 


+192 


+216 


+224 
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Description 


PROCESSOR TYPE>4 
Type of this processor: 


Bits Interpretation 

63-32 Minor type 

31-0 Major type 

The processor types are defined in Appendix D. 


PROCESSOR VARIATION? 
The following processor variations are defined: 


Bit Description 


63-3 -ESERVED — MRZ 


-_ : At aa ae 


2 PRIMARY ELIGIBLE (PE) — If set, indicates that this processor is 
eligible to become a primary processor. The processor has direct 
access to the console, a BB_WATCH, and all I/O devices. See Chap- 
ter 3. 


] TEEE-FP — If set, indicates this processor supports IEEE float- 
ing-point operations and data types. If clear, this processor has no 
such support. 


0 VAX-FP — If set, indicates this processor supports VAX float- _ 
ing-point operations and data types. If clear, this processor has no 
such support. 


PROCESSOR REVISION>*4 
Full DEC STD 12 revision field for this processor. This quadword field contains 
four ASCII characters. See Section 2.1.1. 


PROCESSOR SERIAL NUMBER?“ 7 
Full DEC STD serial number for this processor module. This octaword field con- 
tains a 10-character ASCII serial number determined at the time of manufacture; 


see DEC STD 12 for format information. 


PA OF LOGOUT AREA? 
Starting physical address of PALcode logout area for this processor. Logout areas 
must be at least quadword aligned. 


LOGOUT AREA LENGTH? 
Number of bytes in the PALcode logout area for this processor. 
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Offset 


+232 


+240 


+248 


+256 


+264 


+272 


+280 


+288 


Description 


HALT PCBB)? 
Value of the PCBB register when a processor halt condition is encountered by this 
processor. Initialized to the address of the hardware privileged context block 
(HWPCB) at offset [0] from this per-CPU slot at system bootstraps or secondary 
processor starts. 


HALT Pc!’ 
Value of the PC when a processor halt condition is encountered by this processor. 
Zeroed at system bootstraps or secondary processor starts. 


HALT Ps! 
Value of the PS when a processor halt condition is encountered by this processor. 
Zeroed at system bootstraps or secondary processor starts. 


HALT ARGUMENT LIST!” 
Value of R25 (argument list) when a processor halt condition is encountered by 
this processor. Zeroed at system bootstraps or secondary processor starts. 


HALT RETURN ADDRESS?” 
Value of R26 (return address) when a processor halt condition is encountered by 
this processor. Zeroed at system bootstraps or secondary processor starts. 


HALT PROCEDURE VALUE!” 
Value of R27 (procedure value) when a processor halt condition is encountered by 
this processor. Zeroed at system bootstraps or secondary processor starts. 


REASON FOR HALT!” 
Indicates why this processor was halted. Values include: 


Codej¢ Reason 


Bootstrap, processor start, or powerfail restart 

Console operator requested a system crash 

Processor halted due to kernel-stack not-valid halt 

Invalid SCBB 

Invalid PTBR 

Processor executed CALL_PAL HALT instruction in kernel mode 
Double error abort encountered | 
Machine check while in PALcode environment 

8 — FFF Reserved 

Other Implementation-specific 


AN NNBWON KK CO 


Code is set to "0" at console initialization. 


RESERVED FOR SOFTWARE” 
Reserved for use by system software. Zeroed at system bootstraps or secondary 
processor starts. 
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Table 2—4: Per-CPU Slot Fields (Continued) 


Offset Description 


+296 RXTX BUFFER AREA 
Used for interprocessor console communication. See Section 2.4. 


+464 = PALCODE AVAILABLE?* 


Block of 16 quadwords that list previously loaded PALcode variations that are 
available to the console or operating system for PALcode switching. 


The first offset (SLOT[464]) is reserved for an overall firmware revision field for 
this processor, the format of which is determined by the HWRPB revision level 
found at HWRPB[16]. If HWRPB[16] contains 6 or less, the format for 
SLOT[464] is platform specific. If HWRPB[16] is greater than 6, the format for 


SLOT[464] is as follows: 

Bits Interpretation 

63-48 Maximum number of processors that can share this console 
47-32 Console build sequence number (0—-16383) 

31-24 SBZ 

23-16 Variant (0 for console version) 

15-8 Console major revision (0-255) 

7-0 Console minor revision (0-255) 


The format of each subsequent quadword follows the PALcode revision field 
(SLOT[168]). Each quadword is indexed by PALcode variant. If the quadword is 
non-zero, the PALcode variant has been loaded and the operating system may 


switch to that PALcode variant by passing the variant number to CALL_PAL 
SWPPAL. 


+592 PROCESSOR SOFTWARE COMPATIBILITY FIELD® 


Type of pre-existing processor that is software compatible with existing processor. 
Format follows SLOT[176]. 


Bits Interpretation 
63-32 Minor type 
31-0 Major type 
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Table 2—4: Per-CPU Slot Fields (Continued) 


Offset Description 


+600 RESERVED 
Reserved for DIGITAL; SBZ. 


Initialized by the console for the primary at all system bootstraps (cold or warm) and for a 
secondary before processor start. 


May by modified by system software for a secondary before processor start. 


Initialized by the console for a secondary at cold system bootstrap only. Preserved unchanged 
by the console at all other times. 


Initialized by the console for the primary at cold system bootstrap only. Preserved unchanged 
by the console at all other times. 


Support PALcode loading as described in Section 3.3. 
May be modified by system software for the primary. 
Set by the console at all processor halts. 


ao sat HN NH 


Initialized by the console at cold bootstrap and never written by system software or console. 


Table 2-5: Per-CPU State Flags 


Bit Description 
63:24 RESERVED; MBZ. 


23:16 HALT REQUESTED! 
Indicates the console action requested by system software executing on this proces- 
sor. Values include: 


Code;g Reason 


Default (no specific action) 


0 

1 SAVE_TERM/RESTORE_TERM exit 
2 Cold bootstrap requested 

3 Warm bootstrap requested 

4 Remain halted (no restart) 

Other Reserved 


Set to "0" at system bootstraps and secondary processor starts. May be set to 
non-zero by system software before processor halt and subsequent processor entry 
into console I/O mode. See Sections 3.5.7 and 3.4.5. 


15:9 RESERVED; MBZ. 


8 PALCODE LOADED (PL)?*° 
Indicates that this processor’s PALcode image has been loaded into the address 
given in the processor’s slot PALcode memory space address field. See Sections 
3.3.1 and 3.4.3.3. 
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Table 2-5: Per-CPU State Flags (Continued) 
Bit Description 


7 PALCODE MEMORY VALID (PMV)? 
Indicates that this processor’s PALcode memory and scratch space addresses are 
valid. Set after the necessary memory is allocated and the addresses are written into 
the processor’s slot. See Sections 3.3.1 and 3.4.3.3. 


6 PALCODE VALID (PV)* 
Indicates that this processor’s PALcode is valid. Set after PALcode has been suc- 
cessfully loaded and initialized. See Sections 3.3.1 and 3.4.3.3. 


5 CONTEXT VALID (CV)! 3 
Indicates that the HWPCB in this slot is valid. Set after the console or system soft- 
ware initializes the HWPCB in this slot. See Sections 3.3.1 and 3.4.3. 


4 OPERATOR HALTED (OH)!® 
Indicates that this processor is in console I/O mode as the result of explicit operator 
action. See Section 3.5.8. 


3 PROCESSOR PRESENT (PP)*” 
_ Indicates that this processor is physically present in the configuration. 


2 PROCESSOR AVAILABLE (PA)*> 
Indicates that this processor is available for use by system software. The PA bit — 
may differ from the PP bit based on self-test or other diagnostics, or as the result of 
a console command that explicitly sets this processor unavailable. 


1 RESTART CAPABLE (RC)!73 
Indicates that system software executing on this processor is capable of being 
restarted if a detected error halt, powerfail recovery, or other error condition occurs. 
Cleared by the console and set by system software. See Sections 3.4.1.3, 3.4.3.6, 
and 3.5.1. 


0 BOOTSTRAP IN PROGRESS (BIP)!:* 


: 
m na a avetam hnnt_ 
For the primary, this bit indicates that this processor is undergoing a system b 


strap. For a secondary, this bit indicates that a CPU start operation is in progress. 
Set by the console and cleared by system software. See Sections 3.4.1.3, 3.4.3.6, 
and 3.5.1. 


Initialized by the console for the primary at all system bootstraps (cold or warm) and for a 
secondary before processor start. 


May be modified by system software for the primary. 
May be modified by system software for a secondary before processor start. 


Initialized by the console for primary at cold system bootstrap only. Preserved unchanged by 
the console at all other times. 


wa 


Initialized by the console for a secondary at cold system bootstrap only. Preserved 
unchanged by the console at all other times. 


Set by the console at all processor halts. 
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2.1.4 Configuration Data Block 


Systems may have a Configuration Data Block (CONFIG). The format of the block and 
whether it exists in a system is implementation specific. If present, the block must be mapped 
in the bootstrap address space. The CONFIG offset at HWRPB[208] contains the block offset 
address; if no CONFIG block exists, the offset is zero. The first quadword of a CONFIG block 
must contain the size in bytes of the block. The second quadword must contain a checksum for 
the block. The checksum is computed as a 64-bit sum, ignoring overflows, of all quadwords in 
the configuration data block except the checksum quadword. 


2.1.5 Field Replaceable Unit Table 


Systems may have a field replaceable unit (FRU) table. The format of the table and whether it 
exists in a system is implementation specific. If present, the table must be mapped in the boot- 
strap address space. The FRU table offset at HWRPB[216] contains the table offset address; if 
no FRU table exists, the offset is zero. 


2.1.6 \Dynamic System Recognition Data Block (DSRDB) 


The DSRDB, pointed to by offset HWRPB[312], provides dynamic system recognition. It con- 
tains the contents of the LURT tables. 


All fields in the DSRDB are initialized by the console at all system bootstraps (cold or warm) 
and never written by system software. 


The DSRDB structure is shown in Figure 2—4 and described in Table 2-6. 
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i | ASCII platform name (0 terminated and 0 filled to end of quadword) [ere 


Figure 2-4: Dynamic System Recognition Data Block (DSRDB) 


63 0 
Offset to LURT table 
Count of LURT 
. LURT column 1 
LURT column 2 
LURT column n 


Notes 


This area is zero length. However, software should not depend on adjacency but rather 
use the offset in DSRDB+8 to find the address of the LURT table. 


1. 





:DSRDB 

+08 

+16 

See Note 1 
:+(DSRDB[08}]) 
-4(DSRDB[08))+8 
:+(DSRDB[08])+16 





:+(DSRDB[08])+n 


See Note 2 


. This area is zero length. However, software should not depend on adjacency but rather 


use the offset in DSRDB+16 to find the address of the name count table. 


Table 2-6: Dynamic System Recognition Data Block (DSRDB) Fields 


Offset 
DSRDB 


+08 


+16 © 


+(DSRDB[8}]) 


Description 


SMM 
System marketing model number. Unique binary identifier for plat- 
farm Valiwac aro faund in tha AT DLA S A : 


nN 
AVALALLe VALU D GLY LUULIU 21) ULI SABE LE) ANIVEA LILY 


Offset to LURT table 
Unsigned offset from beginning of DSRDB to starting address of 
LURT table. 


Offset to name count 
Unsigned offset from beginning of DSRDB to quadword that contains 
the count of the platform ASCII name string. 


LURT table 

First quadword contains a count of the number of LURT table entries. 
Each entry is a quadword and corresponds to a column in the existing 
LURT table format shared by the OpenVMS Alpha and DIGITAL 
UNIX operating systems. | 
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Table 2-6: Dynamic System Recognition Data Block (DSRDB) Fields (Continued) 


Offset Description 


+(DSRDB[16]) Name count 
Quadword that contains a count of the size of the ASCII model name 
string. Count does not include any trailing <NUL> fill characters. 


+(DSRDB[16])+8 Name string 
. ASCII model name string, terminated by at least one <NUL> charac- 
ter. Filled to end of last quadword with <NUL> characters.\ 


2.2 Environment Variables 


The environment variables provide an easily extensible mechanism for managing complex 
console state. Such state may be variable length, may change with system software, may 
change as a result of console state changes, and may be established by the console presenta- 
tion layer. Environment variables may be read, written, or saved. 


An environment variable consists of an identifier (ID) and a byte stream value maintained by 
the console. There are three classes of environment variables: , 
1. Common to all implementations: ID = 0 to 3Fj¢. 


These have meaning to both the console and system software. All consoles must 
implement all of these environment variables. 


2. Specific to a given console implementation: ID = 40 to 7Fy¢. 


These have meaning to a given console implementation and system software 
implementation. Support for these environment variables is optional. 


3. Specific to system software: ID = 80 to FF j¢. 


These have meaning to a given system software application or implementation; the 
console passes these environment variables between the console presentation layer 
and the target application without interpretation. Support for these environment 
variables is optional. 


If a console supports optional environment variables, they must be described in the relevant 
console implementation specification and registered with the Alpha architecture group. 


The value, format, and size of each environment variable depends on the environment variable 
and the console implementation. The size of an environment variable value is specified in 
bytes. The byte stream value of most environment variables consists of an ASCII string. 


The booting environment variables, BOOT_DEV, BOOTDEF_DEV, and BOOTED_DEV, 
contain values that can consist of multiple fields and lists. For those variables, the values are 
parsed as follows: 


e Each field is delimited by one and only one space """ (206). 


e Each list element is delimited by one and only one comma "," (2Cj¢). 
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e Any numeric quantities are expressed in decimal. 
e All characters are case-blind and may be expressed in uppercase or lowercase. 


Other examples of environment variables that have list values are BOOTED_OSFLAGS and 
DUMP_DEV. 


BY 


Programming Note: 


For example, BOOT_DEV might consist of "0 4 MSCP,0 1 MOP" and BOOT_OSFLAGS 
might consist of "7,2,28". 


System software uses the console environment variable routines to access the environment 
variables. Each environment variable is identified by an identification number (ID). If the con- 
sole resolves the ID, the associated byte stream value is returned. The console environment 
variable routines present system software with a consistent interface to environment variables 
regardless of the presentation layer and internal console representation. The console operator 
interacts with the console presentation layer to access environment variables. See Section 1.3 
for details. 


In a multiprocessor system, the console must ensure that the dynamic state created by the envi- 
ronment variables is common to all processors. It must not be possible for a value observed on 
a secondary to differ from that observed on the primary or another secondary. This is neces- 
sary to support bootstrapping, restarting a processor, and switching the primary. 


Some environment variables contain critical state that must be maintained across console ini- 
tializations and system power transitions. Other environment variables contain dynamic state 
that must be initialized at console initialization and retained across warm bootstraps. Still oth- 
ers contain dynamic state that is initialized at each system bootstrap. 


Environment variable values that must be maintained across console initializations must be 
retained in some sort of nonvolatile storage. Default values for these environment variables 
must be set before system shipment. Thus, there are three possible values: the dynamic value, 
the default value retained in nonvolatile storage, and the initial default value set in nonvolatile 
storage before system shipment. The console need not preserve the initial default value. If con- 
sole implementation preserves the initial default value, that value is accessible only to the 
console presentation layer; system software accesses only the dynamic and default (last writ- 
ten) values. The dynamic and default values may differ at any time after console initialization 
as the result of changes by system software or the console operator. 


The internal mechanisms for representing and implementing environment variables are deter- 
mined by the console and are unknown to both system software and the console presentation 
layer. The method of handling the required nonvolatile storage also depends on the 
implementation.. 


Table 2-7 lists the environment variables maintained by the console. Each environment ID is 
also assigned a symbolic name that is used to reference the environment variable elsewhere in 
this specification. Tables 2-8 and 2-9, respectively, list supported languages and character 
sets. 
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Table 2-7: Required Environment Variables 


Environment Variable 


IDi¢6 Symbol Description 
e Reserved 
01 AUTO_ACTION! Console action following an error halt or powerup. 


Defined values and the action invoked are: 
¢ = "BOOT" (544F 4F 42.) bootstrap 
¢ "HALT" (544C 4148 )¢) halt 
¢ "RESTART" (54 5241 5453 455216) restart 


Any other value causes a halt; The default value 
when the system is shipped is "HALT". See Section 
3.151, 


02 BOOT DEV2 Device list used by the last (or currently in progress) 
7 bootstrap attempt. The console modifies 
BOOT_DEV at console initialization and when a 
bootstrap attempt is initiated by a BOOT command. 
The value of BOOT_DEV is set from the device list 
specified with the BOOT command or, if no device 
list is specified, BOOTDEF_DEV. The console uses 
BOOT_DEV without change on all bootstrap 
attempts that are not initiated by a BOOT command. 
See Section 3.4.1.5. The format is independent of the 
console presentation layer. 


03 BOOTDEF DEv!2 Device list from which bootstrapping is to be 
- attempted when no path is specified by a BOOT 
command. See Section 3.4.1.5. The format follows 
BOOT_DEV. The default value when the system is 
shipped indicates a valid implementation-specific 
device or NULL (0016). 


04 BOOTED_DEV° Device used by the last (or currently in progress) 
bootstrap attempt. Value is one of the devices in the 
-BOOT_DEV list. See Section 3.4.1.5. The format is 
independent of the console presentation layer.e 


05 BOOT FILE!2 File name to be used when a bootstrap requires a file 
> name and when the bootstrap is not the result of a 
BOOT command or when no file name is specified 
on a BOOT command. The console passes the value 
between the console presentation layer and system 
software without interpretation; the value is pre- 
served across warm bootstraps. The default value 
when the system is shipped is NULL (00j¢). 
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Table 2-7: Required Environment Variables (Continued) 


Environment Variable : ae 
IDi¢ Symbol escription 


06 BOOTED FILE? File name used by the last (or currently in progress) 
7 bootstrap attempt. The value is derived from 
BOOT_FILE or the current BOOT command. The 
console passes the value between the console presen- 
tation layer and system software without interpreta- 
tion. 


07 BOOT OSFLAGS!2 Additional parameters to be passed to system soft- 
7 ware when the bootstrap is not the result of a BOOT 
command or when no parameters are specified on a 
BOOT command. The console preserves the value 
across warm bootstraps and passes the value between 
the console presentation layer and system software 
without interpretation. The default value when the 
system is shipped is NULL (00}6). 


08 BOOTED OSFLAGS? Additional parameters passed to system software 
* during the last (or currently in progress) bootstrap 
attempt. The value is derived from 
BOOT_OSFLAGS or the current BOOT command. 
The console passes the value between the console 
presentation layer and system software without inter- 
pretation. 


09 BOOT_RESET!2 Indicates whether a full system reset is performed in 
response to an error halt or BOOT command. 
Defined values and the action invoked are: 


e ="OFF" (46 464F)¢) warm bootstrap, no full 
system reset is performed. 


e =—"ON" (4E4F 6) cold bootstrap, a full system 
reset is performed. 


See Sections 3.4.1 and 3.4.2. The default value when 
the system is shipped is implementation specific. 


OA DUMP DEv!? Device used to write operating system crash dumps. 
7 The format follows BOOTED_DEV and is indepen- 
dent of the console presentation layer. The value is 
"preserved across warm bootstraps. The default value 
when the system is shipped indicates an implementa- 
tion-specific device or NULL (00}¢). 
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Table 2-7: Required Environment Variables (Continued) 


Environment Variable eee 
D6 Symbol escription 


OB ENABLE_AUDIT! Indicates whether audit trail messages are to be gen- 
erated during bootstrap. Defined values and the 
action invoked are: 


e "OFF" (46 464F;¢). Audit trail messages 
suppressed. 


e 6©"ON" (4E4Fj¢). Audit trail messages gener- 
ated. 


The default value when the system is shipped is 
"ON." 


OC LICENSE! Software license in effect. The value is derived in an 
implementation-specific manner during console ini- 
tialization. \Defined values and (optional) software 
interpretation are: 


e "MU" (554D,6) multiple user system. 
e =6"SU" (555346) single user system\ 


OD CHAR SET!2 Current console terminal character set encoding. 
Defined values are given in Table 2-9. The default 
value when the system is shipped is determined by 
the manufacturing site. 


OE LANGUAGE! 2 Current console terminal language. Defined values 
are given in Table 2-8. The default value when the 
system is shipped is determined by the manufactur- 
ing site. 


OF TTY DEV!23 Current console terminal unit. Indicates which entry 
° of the CTB table corresponds to the actual console 
terminal. The value is preserved across warm boot- 

straps. The default value is "0" (30) ¢) 


10- 3F Reserved for DIGITAL. 
40 —7F Reserved for console implementation use. 
80 — FF Reserved for system software use. 


! Nonvolatile. The last value saved by system software or set by console commands is pre- 


served across system initializations, cold bootstraps, and long power outages. 


Warm nonvolatile. The last value set by system software is preserved across warm boot- 
straps and restarts. 


Read-only. The variable cannot be modified by system software or console commands. 
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Table 2-8: Supported Languages 


LANGUAGE}, Language Character Set GETC Bytes 
0 None (cryptic) ISO Latin-1 1 
30 Dansk ISO Latin-1 1 
32 Deutsch ISO Latin-1 1 
34 Deutsch (Schweiz) ISO Latin—1 1 
36 English (American) ISO Latin-1 1 
38 English (British/Irish) ISO Latin-1 1 
3A Espanol ISO Latin-1 1 
3C Francais ISO Latin—1 1 
3E Francais (Canadian) TSO Latin—-1 1 
40 Francais (Suisse Romande) ISO Latin-1 1 
42 Italiano ISO Latin—1 1 
44 Nederlands ISO Latin-1 1 
46 Norsk ISO Latin-1 1 
48 _ Portugues ISO Latin—1 1 
4A Suomi ISO Latin-1 1 

AC Svenska ISO Latin-1 1 
4E Viaams ISO Latin-1 1 
Other Reserved 

Table 2-9: Supported Character Sets 


CHAR_SET}¢ Character Set 
0 ISO Latin-1 
Other Reserved. 
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2.3 Console Callback Routines 


System software can access certain system hardware components through a set of callback rou- 
tines provided by the Alpha console. These routines give system software an architecturally 
consistent and relatively simple interface to those components. 


All of the console callback routines may be used by system software when the operating sys- 
tem has only restricted functionality, such as during bootstrap or crash. When invoked in this 
context, the console may assume full control of system platform hardware. Some of the con- 
sole callback routines may be used by system software when the operating system is fully 
functional. Such usage imposes constraints on the console implementation. 


All routines must be called by system software executing in kernel mode. All routines require 
that the HWRPB and the per-CPU, CTB, and CRB offset blocks are virtually mapped and ker- 
nel read/write accessible. If these conditions are not met, the results are UNDEFINED. If 
conditions from within user mode are not met, the results are UNPREDICTABLE. Some of 
the routines execute correctly only at or above certain IPLs. 


The routines must never modify any processor registers except those explicitly indicated by 
the routine descriptions. 


2.3.1 System Software Use of Console Callback Routines 
The console callback routines present an environment to the operating system in which the fol- 
lowing behavior must be implemented. These routines must: 
e Not alter the current IPL 
e Not alter the current execution mode 
e Not disable or mask interrupts 
e Not alter any registers except as explicitly defined by the routine interface 
e Not alter the existing memory management policy 
e Not usurp any existing interrupt mechanisms 
e Be interruptible 


e Ensure timely completion 


Once the operating system is bootstrapped, the console must not reclaim resources transferred 
to that operating system. This includes both the issuing and servicing of I/O device interrupts, 
interprocessor interrupts, and exceptions. 


It is the responsibility of the console implementation to ensure that these console callback rou- 
tines may be invoked at multiple [PLs, may be interrupted, and may be invoked by multiple 
system software threads. The operation of these routines must appear to be atomic to the call- 
ing system software even if that software thread is interrupted. 
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In a multiprocessor system, some console routines may be invoked only on the primary proces- 
sor. A secondary processor may invoke only a subset of these routines and then only under a 
limited set of conditions. These conditions are explicitly stated in the routine descriptions; if 
violated, the results are UNDEFINED. 


2.3.2 System Software Invocation of Console Callback Routines 


With the exception of the FIXUP routine, all of the routines are accessed uniformly through a 
common DISPATCH procedure. The target routine is identified by a function code. All con- 
sole callback routines are invoked using the Alpha standard calling conventions. 


Any memory management exceptions generated by incorrect mapping or inaccessibility of 
console callback routine parameters produce UNDEFINED results. This occurs naturally for 
those console callback routines that are intended for use while the operating system is fully 
functional; these routines execute in the unmodified context of the operating system. 


For those routines intended for use only while the operating system has restricted functional- 


ity, the DISPATCH routine must cisure that any conficis in mapping or accessibility are 


resolved before permitting the console to gain control of the system platform hardware. 


2.3.3 Console Callback Routine Summary 
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The console callback routines fall into four functional groups: 
1. Console terminal interaction 
2. Generic I/O device access 
3. Environment variable manipulation 
4. Miscellaneous 


The hexadecimal function code, name, and function for each routine are summarized in Table 
_ 2-10. 


Table 2-10: Console Callback Routines 


Code;g Name Function Invoked 

Console Terminal Routines 
01 GETC Get character from console terminal 
02 PUTS Put byte stream to console terminal 
03 RESET_TERM Reset console terminal to default 
04 SET_TERM_INT Set console terminal interrupts 
05 SET_TERM_ CTL Set console terminal controls 
06 PROCESS KEYCODE _ Process and translate keycode 
0O7-F Reserved 
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Table 2-10: Console Callback Routines (Continued) 


Code;, Name 


Function Invoked 


Console Generic I/O Device Routines 


10 OPEN 
11 CLOSE 
12 JOCTL 
13 READ 
14 WRITE 
I5-1F 


Open I/O device for access 

Close I/O device for access 

Perform I/O device-specific operations 
Read I/O device 

Write I/O device 


Reserved 


Console Environment Variable Routines 


20 SET_ENV 

21 RESET_ENV 
22 GET_ENV 
Di | SAVE_ENV 


Set (write) an environment variable 
Reset (default) an environment variable 
Get (read) an environment variable 


Save current environment variables 


Console Miscellaneous Routines 


30 PSWITCH 
(None) FIXUP 
(None) DISPATCH 
32 BIOS_EMUL 
Other 


All Alpha consoles must implement: 


Switch primary processor 


Remap console callback routines 
Access console callback routine 
Run BIOS emulation callback routine 


Reserved 


e All console terminal routines except PROCESS_KEYCODE. 


e All console generic I/O device routines. 


¢ All environment variable routines except SAVE_ENV. 


e The FIXUP and DISPATCH miscellaneous routines. 


The PSWITCH routine is required for all Alpha multiprocessor systems that support dynamic 
primary switching. See Section 3.5.6. 


2.3.4 Console Terminal Routines 


Alpha consoles provide system software with a consistent interface to the console terminal, 
regardless of the physical realization of that terminal. This interface consists of the console ter- 
minal block (CTB) table and a number of console terminal routines. Each CTB contains the 
characteristics of a terminal device that can be accessed through the console terminal routines; 


see Section 2.3.8.2. 
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There is only one console terminal. The CTB table may contain multiple CTBs and the con- 
sole terminal routines may be used to access multiple terminal devices. Each terminal device 
is identified by a "unit number" that is the index of its CTB within the CTB table. The 
TTY_DEV environment variable indicates the unit, hence the CTB, of the console terminal. 
The console terminal unit is determined at system bootstrap and cannot be altered by system 
software. Console terminal device interrupts are delivered at the console terminal device IPL 
to the primary processor; interrupts can be redirected to a secondary only when switching the 
primary processor. 


The console terminal routines permit system software to access the console terminal in a 
device-independent way. These routines may be invoked while the operating system is fully 
functional as well as during operating system bootstrap or crash. All console terminal routines 
are subject to the constraints given in Section 2.3.1. These routines must: 


© Not alter the current IPL or current mode. 


These routines must be invoked in kernel mode at or above the console terminal 
device IPL. 


° Not alter the existing memory management policy. 
All internal pointers must have been remapped by FIXUP. 
¢ Not block interrupts. 


The operating system must be capable of continuing to receive hardware interrupts at 
higher IPLs. 


¢ Be interruptible and re-entrant. - 


These routines may be invoked at multiple IPLs and their execution may be 
interrupted. However, console terminal callback operations are not necessarily atomic. 
In the event of re-entrant invocations, it is UNPREDICTABLE whether or not the 
interrupted operation will fail and characters may be transmitted or received out of 
order. 


The time required for console terminal routines to complete is UNPREDICTABLE; however, 
a console implementation will attempt to minimize the time whenever possible. 


Software Note: 


Implementations must limit the execution time to significantly less than the interval clock 

interrupt period. A return after partial operation completion is preferable to long latency. 
When invoking these routines, system software must: 

e Be executing in kernel mode at or above the console terminal device IPL. 


If these routines are invoked in other modes, their execution causes 
UNPREDICTABLE operation. If invoked at lower IPLs, their execution causes 
UNDEFINED operation. . 


°¢ Be executing on the primary processor in a multiprocessor configuration. 


If these routines are invoked on secondary processors in kernel mode, their execution 
causes UNDEFINED operation. 
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¢ Be prepared to service any resulting console terminal interrupts, if enabled. 


System software must provide valid interrupt service routines for the console terminal 
transmit and receive interrupts. The operating system interrupt service routines must 
be established before enabling interrupts; otherwise the operation of the system is 
UNDEFINED. 


Programming Note: 


Any console terminal interrupt service routines established by the console before 
transferring control to operating system software are not transferred to the operating 
system nor are they remapped by FIXUP. Any console terminal interrupts will be 
delivered only after the operating system lowers IPL from the console terminal device IPL. 


Implementation Note: 


The implementation of console terminal I/O interrupts is specific to the system hardware 
platform. An example of implementation-specific characteristics is console terminal SCB 
vectors. 
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2.3.4.1 GETC - Get Character from Console Terminal 


Format: 
char = DISPATCH ( GETC,unit ) 

Inputs: 

GETC = R16; GETC function code — 011¢ 

unit = R17; Terminal device unit number 

retadr =R26; - Return address 
Outputs: 

char = RO; Returned character and status: 

RO0<63:61> ‘000’ Success, character received 
‘001’ Success, character received, more to 
be read 


‘100’ Failure, character not yet ready for 
reception 
‘110’ Failure, character received with error 
‘111’ Failure, character received with error, 
more to be read 
R0<60:48> Device-specific error status 
RO<47:40> SBZ 
RO<39:32> Terminal device unit number returning char- 
acter 
RO<31:0> Character read from console terminal 


GETC attempts to read one character from a console terminal device and, if successful, returns 
that character in RO<31:0>. The character is not echoed on the terminal device. The size of the 
returned character is from one to four bytes and is a function of the current character set encod- 
ing and language (see Table 2—8). The routine performs any necessary keycode mapping. 


For implementations that support multiple directly addressable terminal devices, R17 contains 
the unit number from which to read the character. If the implementation does not support mul- 
tiple terminal devices or if the devices are not directly addressable, R17 should be zero. The 
unit number from which the character was read is returned in RO<39:32>. If the implementa- 
tion does not support multiple terminal devices, RO<39:32> is returned as zero. 


GETC returns character reception status in RO<63:61>. If received characters are buffered by 
the console terminal, RO<61> is set to ‘1’ whenever additional characters are available. If 
GETC returns a character without error, RO<63:62> is set to ‘00’. If no character is yet ready, 
R0<63:62> is set to ‘10’. If an error is encountered obtaining a character, RO<63:62> is set to 
‘11’. Examples of errors during character reception include data overrun or loss of carrier. 
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When an error is returned by GETC, the contents of RO<31:0> and RO<60:48> depend on the 
capabilities of the underlying hardware. Implementations in which the hardware returns the 
character in error must provide that character in RO<31:0>. Additional device-specific error 
status may be contained in RO<60:48>. 


When appropriate, GETC performs special keyboard operations such as turning keyboard 
LEDs on or off. Such action is based on the incoming stream of keycodes delivered by the con- 
sole terminal. 


The return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.4.2 PROCESS_KEYCODE - Process and Translates Keycode 


Format: 

char = DISPATCH( PROCESS_KEYCODE , unit,keycode,again) 
Inputs: 

PROCESS_KEYCODE =R16; PROCESS_KEYCODE function code — 06;¢ 

unit =R17; Terminal device unit number 

keycode =R18; Keycode to be processed 

again =R19; ‘1’ if calling again for same keycode 

‘0’ otherwise 

retadr =R26; Return address 
Guipuis: 

char =RO; Translated character and status: 


RO0<63:61> ‘000’ Success, character returned 

‘101’ Failure, more time needed to 
process keycode 

‘110’ _—‘ Failure, device not supported 
by routine or routine not sup- 
ported 

‘111’ ~~‘ Failure, no character; more 
keycodes needed or illegal 
sequence encountered 


RO<60> ‘0’ Success in correcting severe 
error 
OL: Failure in correcting severe 
error 
RO<59:32> SBZ 


3 
RO0<31:0> Translated character 


PROCESS_KEYCODE attempts to translate the keycode contained in R18 and, if successful, 
returns the character in RO<31:0>. The translation is based on the current character set encod- 
ing, language, and console terminal device state contained in the appropriate CTB. The 
translated character may be from one to four bytes. For implementations that support multiple 
terminal devices, R17 contains the unit number of the keyboard;otherwise, R17 should be zero. 


Implementation Note: 
For ISO Latin-1 character set encoding, PROCESS_KEYCODE returns a one-byte 


character. 
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PROCESS_KEYCODE returns keycode translation status in RO<63:61>. The processing falls 
into one of several cases: 


1. The keycode, along with previous keycodes if any, translates into a character from the 
currently selected character set. In this case, RO<63:61> set to ‘000’. 


2. The keycode, along with previously entered keycodes if any, does not translate into a 
character from the currently selected character set. This is because either: 


— Not yet enough keycodes have been entered to produce a character in the currently 
selected character set. 


— The keycodes entered to this point indicate a severe keyboard error status. 


— The keycodes entered to this point form an illegal or unsupported keycode 
sequence. 


In this case, RO<63:61> set to ‘111’. 


3. The console terminal device for which keycode translation is being performed is not 
supported by the PROCESS KEYCODE implementation or the console implementa- 
tion does not support PROCESS_KEYCODE. In this case, RO<63:61> set to ‘110’. 


4. The keycode cannot be processed in a reasonable amount of time; multiple invocations 
of PROCESS_KEYCODE are necessary. In this case, the routine returns with 
RO<63:61> set to ‘101’. The subsequent call(s) should be made with the same keycode 
in R18 and R19 set to ‘1’. 


Implementation Note: 

It may not be possible for an implementation to perform all the actions associated 
with special keycodes (such as turning on LEDs) in a timely manner. The 
PROCESS_KEYCODE routine must return after partial completion of an 
operation if necessary. It is the responsibility of the console to ensure that 
subsequent calls make forward progress. The delay between successive operating 
system calls is UNPREDICTABLE, although the operating system should attempt 
to complete the operation in a timely fashion. See Section 2.3.4. 


In all but the first case, the contents of RO<31:0> are UNPREDICTABLE. 


When certain severe keyboard errors are encountered, PROCESS_ KEYCODE attempts to cor- 
rect them by performing special keyboard operations. Those severe errors that may be 
corrected are device specific and contained in the terminal device CTB. If an error is encoun- 
tered and the attempt to correct the error is unsuccessful, RO<60> is set to ‘1’; otherwise 
RO<60> is set to ‘0’. 


The keyboard state recorded in the CTB is updated appropriately as the input stream of key- 
codes is processed. If appropriate, PROCESS KEYBOARD may buffer some of the keycodes 
in the CTB keycode buffer. The supported keyboard state changes are device specific and are 
listed in the device CTB. 


The return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.4.3 PUTS — Put Stream to Console Terminal 


Format: 


wcount 


Inputs: 
PUTS 
unit 
address 
length 


retadr 


Outputs: 


wcount 


PUTS attempts to write a number of bytes to a console terminal device. R18 contains the base 
viriuai address of the memory-resident byte stream; R19 contains its 32-bit size in bytes. The 
byte stream is written in order with no interpretation or special handling. The count of the 


= DISPATCH ( PUTS, unit,address,length ) 


= R16: 
=RI7: 
=R18; 


=R19:_ 


= R26; 


PUTS function code — 02 j¢ 


Terminal device unit number 


Virtual address of byte stream to be written 


Count of bytes to be written 


Return address 


Count of bytes written and status: 


RO<63:61> ‘000’ 
‘O01’ 
“100” 


‘V1V 


Success, all bytes written 
Success, some bytes written 
Failure, no bytes written 
Terminal error encountered 
Failure, some bytes written 
Terminal error encountered 


RO<60:48> Device-specific error status 


RO<47:32> SBZ 


RO<31:0> Count of bytes written (unsigned) 


bytes transmitted is returned in RO<31:0>. 


Programming Note: 


For multiple-byte character set encodings, the returned byte count may indicate a partial 
character transmission. __ 


For implementations that support multiple terminal devices, R17 contains the unit number to 


which the byte stream is to be written; otherwise, R17 should be zero. 
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PUTS returns byte stream transmission status in RO<63:61>. If only a portion of the byte 
stream was written, RO<61> is set to ‘1’. If no error is encountered, RO<63:62> is set to ‘O00’. 
If no bytes were written because the terminal was not ready, RO<63:62> is set to ‘10’. If an 
error is encountered writing a byte, RO<63:62> is set to ‘00’. Examples of errors during byte 
transmission include data overrun or loss of carrier. 


When an error is returned by PUTS, additional device-specific error status may be contained 
in RO<60:48>. 


Multiple invocations of PUTS may be necessary because the console terminal may accept only 
a very few bytes in a reasonable period of time. 


The output byte stream located by R18 should be mapped and read accessible by the kernel; 
the return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.4.4 RESET_TERM — Reset Console Terminal to default parameters 


Format: 


status = DISPATCH ( RESET_TERM, unit ) 


Inputs: 
_RESET_TERM =RI16; RESET_TERM function code — 0346 


unit =R17; Terminal device unit number 
retadr =R26; Return address 
Outputs: 
status = RO; Status: 
RO<63> ‘0’ Success, terminal reset 
male Failure, terminal not fully reset 


RO<62:0> SBZ 


RESET_TERM resets a console terminal device and its CTB to their initial, default state. All 
errors in the CTB are cleared. For implementations that support multiple terminal devices, 
R17 contains the unit number to be reset; otherwise, R17 should be zero. 


The CTB describes the capabilities of the terminal device and its initial, default state. Depend- 
ing on the terminal device type and particular console implementation, other terminal devices 
may be affected by the routine. 


Programming Note: 


For example, if multiple terminal units share a common interrupt, that interrupt may be 
disabled or enabled for all. 


If the console terminal is successfully reset, RESET_TERM returns with RO<63> set to ‘0’. If 
errors are encountered, the routine attempts to return the consoie terminai to a usable state and 


then returns with RO<63> set to ‘1’. 


The return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.4.5 SET TERM CTL — Set Console Terminal Controls 


Format 

status = DISPATCH (SET_TERM_CTL, unit, ctb ) 
Inputs 

SET_TERM_CTL =R16; SET_TERM_CTL function code — 051¢ 

unit =R17; Terminal device unit number 

ctb =R18; Virtual address of CTB 

retadr =R26; Return address 
Outputs: 

status = RO; Status: 

R0<63> ‘0’ Success, requested change com- 
pleted 
me Failure, change not completed 


RO<62:32> SBZ 
RO<31:0> Offset to offending CTB field (unsigned) 


SET_TERM_CTL, if successful, changes the characteristics of a console terminal device and 
updates its CTB. The changes are specified by fields contained in a CTB located by R18. The 
characteristics that can be changed, hence the active CTB fields, depend on the console termi- 
nal device type. 


For implementations that support multiple terminal devices, R17 contains the unit number to 
be reset; otherwise, R17 should be zero. 


If the console terminal characteristics are successfully changed, SET_TERM_CTL returns 
with RO<63> set to ‘0’. If errors are encountered or if the terminal device does not support the 
requested settings, the routine attempts to return the device to the previous usable state and 
then returns with RO<63> set to ‘1’ and RO<31:0> set to the offset of an offending or unsup- 
ported field in the CTB located by R18. Regardless of success or failure, the device CTB table 
entry always contains the current device characteristics upon routine return. SET_TERM_CTL 
returns the CTB located by R18 without modification. 


The CTB located by R18 should be mapped and read accessible by the kernel; the return 
address indicated by R26 should be mapped and executable by the kernel. 
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2.3.4.6 SET_TERM_INT - Set Console Terminal Interrupts 


Format: 
status = DISPATCH ( SET_TERM_INT,unit,mask ) 
Inputs: 
SET_TERM_INT =R16; SET_TERM_INT function code — 0416 
unit =R17; Terminal device unit number 
mask =R18; Bit encoded mask: 
R18<63:10> SBZ 
R18<9:8> ‘Ol’ No change to receive interrupts 


‘00’ ~— Disable receive interrupts 
‘1X’ Enable receive interrupts 


7.9 on7 
BIS<72> IDL 


R18<1:0> ‘01’ No change to transmit interrupts 
‘00’ _— Disable transmit interrupts 
‘1X’ Enable transmit interrupts 


retadr =R26; Return address 
Outputs: 
status = RO; Status: 
RO<63> ‘0’ Success 
oe Failure, operation not supported 
RO<62:2> SBZ 
RO<0> ‘y’ Transmit interrupts enabled 
‘0’ Transmit interrupts disabled 
RO<1> ‘ Receive interrupts enabled 
‘0’ Receive interrupts disabled 


SET_TERM_INT reads, enables, and disables transmit and receive interrupts from a console 
terminal device and updates its CTB. For implementations that support multiple terminal 
devices, R17 contains the unit number to be reset; otherwise, R17 should be zero. 


If the interrupt settings are successfully changed, the routine returns with RO<63> set to ‘0’. If 
the terminal device does not support the requested setting, the routine returns with RO<63> set 
to ‘Ll’. 


Programming Note: 
For example, a device that has a unified transmit/receive interrupt would not support a 
request to enable transmit interrupts while leaving receive interrupts disabled. 
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Regardless of success or failure, the routine always returns with the previous settings in 
RO<1:0>. The current state of the interrupt settings can be read without change by invoking 
SET_TERM_INT with R18<1:0> and R18<9:8> set to ‘01’. 


The return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.5 Console Generic I/O Device Routines 


The Alpha console provides primitive generic I/O device routines for system software use dur- 
ing the bootstrap or crash process. These routines serve in place of the more sophisticated 
system software I/O drivers until such time as these drivers can be established. These routines 
may also be used to access console-private devices that are not directly accessible by the 
processor. 


During the bootstrap process, these routines can be used to acquire a secondary bootstrap pro- 
gram from a system bootstrap device or write messages to a terminal other than the logical 
console terminal. When the operating system is about to crash, these routines can be used to 
write dump files. 


These routines are not intended for use while the operating system is fully functional. These 
routines may: 


e §=6Alter the current IPL. 


The consoie may raise the current IPL. It may lower the current IPL only insofar as 
the state presented to the operating system remains consistent, as though the IPL had 
not been lowered. The console must ensure that interrupts that would not have been 
delivered at the caller’s IPL are pended and delivered to the operating system at the 
conclusion of the callback. 


e ~=Block interrupts. 


These routines may cause any and all interrupts to be blocked or delivered to and 
serviced by the console for the duration of the routine execution. 


¢ Block exceptions. 


These routines may cause any and all exceptions to be blocked or delivered to and 
serviced by the console for the duration of the routine execution. 


e Alter the existing memory management policy. 


The console may substitute a console-private (or bootstrap address) mapping for the 
duration of the routine execution. 


Programming Note: 
The console must resolve any virtually addressed arguments before altering the 
existing memory management policy. 


e =6Take any length of time for completion. 


The operating system cannot guarantee timeliness when invoking these routines. Any 
operating system timer may have expired before their return. The time necessary for 
completion is UNPREDICTABLE; however, a console implementation will attempt 
to minimize the time whenever possible. 


Ln generat: 
JO 111VOK1 


Dafars vatuenin 


oCrore rewiring to 


ing system software, these routines must restore any altered pro- 
cessor state. These routines must return to the calling system software at the IPL and in the 
memory management policy of that software. 


et 


DIGITAL Restricted Distribution 


2-46 Console Interface Architecture (Ill) 


System software invokes these routines synchronously. When invoking these routines, system 
software must: 


© Be executing in kernel mode. 


If these routines are invoked in other modes, their execution causes 
UNPREDICTABLE operation. 


e Be executing on the primary processor in a multiprocessor configuration. 


If these routines are invoked on other processors, their execution causes 
UNDEFINED operation. 
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2.3.5.1 CLOSE — Close Generic I/O Device for Access 


Format: 
status = DISPATCH ( CLOSE, channel ) 
Inputs: 
CLOSE = R16; CLOSE function code — 11)¢ 
channel = R17; Channel to close 
retadr = R26; Return address 
Outputs: 
status = RO; Status: 
RO<63> ‘0’ Success 
pe I Failure 


RO<62:60> SBZ 
RO<59:32> Device-specific error status 
RO<31:0> SBZ 


CLOSE deassigns the channel number from a previously opened block storage I/O device. 
The channel number is free to be reassigned. The I/O device must be reopened before any sub- 
sequent accesses. 


CLOSE returns status in RO<63>. If the channel was open and the close is successful, RO<63> 
is set to ‘0’; otherwise RO<63> is set to ‘1’ and additional device-specific status is recorded in 


R0<59:32>. 


For magnetic tape devices, CLOSE does not affect the current tape position, nor is any rewind 
of the tape performed. 


ddress indicated by R26 shouid be mapped and executabie by the kernel. 
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2.3.5.2 IOCTL — Perform Device-Specific Operations 


Format: 
count = DISPATCH (IOCTL,channel,R18,R19,R20,R21 ) 
Inputs: 
IOCTL =R16;  IOCTL function code — 12)¢ 
channel = R17; Channel number of device to be accessed 
retadr = R26; Return address 


For Magnetic Tape Devices Only: 


operate =R18; Tape positioning operation: 
‘Ol’ _— For skip to next/previous interrecord gap 
‘02’ ‘For skip over tape mark 
‘03’ ~—s- For rewind 
‘04’ ~—s- For write tape mark 


count =R19; Number of skips to perform (signed) 
= R20 - Reserved for future use as inputs 
R21 
Outputs: 


For Magnetic Tape Devices Only: 


count = RO; Number of skips performed and status: 
RO<63:62> ‘00’ Success 
‘10’ ‘Failure, position not found 
‘11’ Hardware failure 
RO<61:60> SBZ 
RO<59:32> Device-specific error status 
RO<31:0> = Number of skips actually performed (signed) 


IOCTL performs special device-specific operations on J/O devices. The operation performed 
and the interpretation of any additional arguments passed in R18—R21 are functions of the 
device type as designated by the channel number passed in R17. 
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For magnetic tape devices, the following operations are defined: 


e ‘01’ — IOCTL relocates the current tape position by skipping over a number of inter- 
record gaps. The direction of the skip and the number of gaps skipped is given by the 
signed 32-bit count in R19. Skipping with a count of ‘0’ does not change the current 
tape position. The number of gaps actually skipped is returned in RO<31:0>. 


¢ ‘02’ — IOCTL relocates the current tape position by skipping over a number of tape 
marks. The direction of the skip and the number of marks skipped is given by the 
signed 32-bit count in R19. Skipping with a count of ‘0’ does not change the current 
tape position. The number of tape marks actually skipped is returned in RO<31:0>. 


e ‘03’ —IOCTL rewinds the tape to the position just after the Beginning-of-Tape (BOT) 
marker. RO<31:0> is returned as SBZ. 


e = ‘04’? — IOCTL writes a tape mark starting at the current position. RO<31:0> is returned 
as SBZ. 


IOCTL returns magnetic tape operation status in RO<63:62>. If the operation was successful, 
RO<63:62> is set to ‘00’. If the tape positioning was not successful, the tape is left at the nosi- 
tion where the error occurred and RO<63:62> is set to ‘10’. Tape positioning may fail due to 
encountering a BOT marker (R18 ‘01’ or ‘02’), encountering a tape mark (R18 ‘01’), or run- 
ning off the end of the tape. If a hardware device error is encountered, the final position of the 
tape is UNPREDICTABLE and RO0<63:62> is set to ‘11’. In the event of an error, additional 
device-specific status is recorded in RO<59:32>. 


The return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.5.3 OPEN — Open Generic I/O Device for Access 


Format: 
channel = DISPATCH ( OPEN, devstr, length ) 
Inputs: 
OPEN = R16; OPEN function code — 1016 
devstr = R17; Starting virtual address of byte string that contains the device 
specification 
length = R18; Length of byte buffer 
retadr = R26; Return address 
Outputs: 
channel = RO; Assigned channel number and status: 


RO0<63:62> ‘00’ Success 
‘10’ Failure, device does not exist 
‘11’ Failure, error, device cannot be 
accessed or prepared 
R0<61:60> SBZ 
RO<59:32> Device-specific error status 
RO<31:0> Assigned channel number of device 


OPEN prepares a generic I/O device for use by the READ and WRITE routines. R17 contains 
the base virtual address of a byte string that specifies the complete device specification of the 
I/O device. The length of the string is given in R18. The format and contents of the device 
specification string follow that of the BOOTED_DEV environment variable. 


The routine assigns a unique channel number to the device. The channel number is returned in 
RO and must be used to reference the device in subsequent calls to the READ, WRITE, and 
CLOSE routines. 


OPEN returns status in RO<63:62>. If the I/O device exists and can be prepared for subse- 
quent accesses, RO<63:62> is set to ‘00’. If the device does not exist, RO<63:62> is set to “10”. 
If the device exists, but errors are encountered in preparing the device, RO<63:62> is set to 
‘11’ and additional device-specific status is recorded in RO<59:32>. In the latter two failure 
cases, the channel number returned in RO<31:0> is UNPREDICTABLE. 


All console implementations must support at least two concurrently opened generic I/O 
devices. Additional generic I/O devices may be supported. 
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For magnetic tape devices, OPEN does not affect the current tape position, nor is any rewind 
of the tape performed. 


Multiple channels cannot be assigned to the same device; the second and any subsequent calls 
to OPEN fail with RO<63:62> set to ‘11’ and RO<31:0> as UNPREDICTABLE. The status of 
the first opened channel is unaffected. 


The input string located by R17 should be mapped and read accessible by the kernel; the 
return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.5.4 READ — Read Generic I/O Device 


Format: 
rcount = DISPATCH ( READ, channel,count,address,block ) 
Inputs: 
READ =R16; READ function code — 1346 
channel = R17; Channel number of device to be accessed 
count =R18; Number of bytes to be read (should be multiple of the device’s 
record length) (unsigned) 
address =R19: Virtual address of buffer to read data into 
block = R20; Logical block number of data to read (used only by disk 
devices) 
retadr = R26; Return address 
Outputs: 
rcount = RO; Number of bytes read and status: 
R0<63> ‘0’ Success 
‘1’ Failure 
RO0<62> “1? EOT or Logical End of Device condi- 
tion encountered 
‘0’ Otherwise 
RO<61> pl Illegal record length specified 
‘0’ Otherwise 
RO<60> . ‘T Run off end of tape 
<0’ Otherwise 


RO<59:32> Device-specific error status 
RO<31:0> Number of bytes actually read (unsigned) 


READ causes data to be read from the generic I/O device designated by the channel number in 
R17 and written to a memory buffer pointed to by R19. The 32-bit transfer byte count, hence 
length of the buffer, is contained in R18. The buffer must be quadword aligned, virtually 
mapped, and resident in physical memory. 


READ returns transfer status in RO<63:60> and the number of bytes actually read, if any, in 
RO<31:0>. If the routine is successful, RO<63> is set to ‘0’. If an error is encountered in 
accessing the device, RO<63> is set to ‘1’. Additional device-specific status may be returned 
in RO<59:32>. 


The transfer byte count should be a multiple of the record length of the device. If the specified 
byte count is not a multiple of the record length, RO<61> is set to ‘1’. If the count exceeds the 
record length, the count is rounded down to the nearest multiple of the record length and 
READ attempts to read that number of bytes. If the record length exceeds the count, it is 
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UNPREDICTABLE whether READ attempts to access the device. If no read attempt is made, 
RO<63> is set to ‘1’. 


For magnetic tape devices, READ does not interpret the tape format or differentiate between 
ANSI formatted and unformatted tapes. The routine reads the requested transfer byte count 
starting at the current tape position. READ terminates when one of the following occurs: 


¢ The specified number of bytes have been read. In this case, RO<63:60> is set to ‘0000’. 


e An interrecord gap is encountered. In this case, the tape is positioned to the next posi- 
tion after the gap and RO<63:60> is set to ‘0000’. 


e A tape mark is encountered. In this case, the tape is positioned to the next position after 
the tape mark and RO<63:60> is set to ‘0100’. (After calling READ and finding a tape 
mark, the caller can determine if the logical End-of-Volume or an empty file section has 
been found by calling READ again. The condition exists if the second READ returns 
with zero bytes read and a tape mark found.) 


e The routine runs off the end of tape. In this case, RO<63:60> is set to ‘1001’. 
READ ignores End-of-Tape (EOT) markers. 


For disk devices, READ does not understand the file structure of the device. The routine reads 
the requested transfer byte count starting at the logical block number specified by R20. The 
transfer continues until either the specified number of bytes has been read or the last logical 
block on the device has been read. If the logical end of the device is encountered, then 
RO0<63:62> is set to ‘OI’. 


For network devices, READ interprets and removes any device-specific or protocol-specific 
packet headers. If a packet has been received, the remainder of the packet is copied into the 
specified buffer. If a packet has not been received, the routine returns with RO<31:0> set to 
‘0’. Only those network packets that are specifically addressed to this system and are of the 
specified protocol type are returned; broadcast packets are not returned. The actual packet size 
is dependent on the device and protocol; the characteristics of the network device and protocol 
are specified at the time of the channel OPEN. 


The buffer pointed to by R19 should be mapped and write accessible by the kernel; the return 
address indicated by R26 should be mapped and executable by the Kernel. 


DIGITAL Restricted Distribution 


2-54 Console Interface Architecture (Ill) 


2.3.5.5 WRITE — Write Generic I/O Device 


Format: 
wcount = DISPATCH ( WRITE,channel,count,address,block ) 
Inputs: 
WRITE =R16; WRITE function code — 14)¢ 
channel = R17; Channel number of device to be accessed 
count = R18; Number of bytes to be written (should be multiple of the 
device’s record length) (unsigned) 
address = R19; Virtual address of buffer to read data from 
block = R20; Logical block number of data to be written (used only by disk 
devices) 
retadr = R26; Return address 
Outputs: 
wcount = RO; Number of bytes written and status: 
RO0<63> ‘0’ Success 
ae ls Failure 
RO<62> ‘LV EOT or Logical End of Device con- 
dition encountered 
‘0’ Otherwise 
RO<61> ‘V Illegal record length specified 
‘0’ Otherwise 
RO<60> ‘T If run off end of tape 
‘0’ Otherwise 


RO<59:32> Device-specific error status 
RO<31:0> Number of bytes actually written (unsigned) 


WRITE causes data to be written to the generic I/O device designated by the channel number 
in R17 and read from a memory buffer pointed to by R19. The 32-bit transfer byte count, 
hence length of the buffer, is contained in R18. The buffer must be quadword aligned, virtu- 
ally mapped, and resident in physical memory. 


WRITE returns transfer status in RO<63:60> and the number of bytes actually written, if any, 
in RO<31:0>. If the routine is successful, RO<63> is set to ‘0’. If an error is encountered in 
accessing the device, RO<63> is set to ‘1’. Additional device-specific status may be returned 
in RO<59:32>. 


The transfer byte count should be a multiple of the record length of the device. If the specified 
byte count is not a multiple of the record length, RO<61> is set to ‘1’. If the count exceeds the 
record length, the count is rounded down to the nearest multiple of the record length and 
WRITE attempts to write that number of bytes. If the record length exceeds the count, it is 
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UNPREDICTABLE whether WRITE attempts to access the device. If no write attempt is 
made, RO<63> is set to ‘1’. 


For magnetic tape devices, WRITE does not interpret the tape format or differentiate between 
ANSI formatted and unformatted tapes. The routine writes the requested transfer byte count 
starting at the current tape position. WRITE terminates when any of the following occur: 


¢ The specified number of bytes has been written without detecting an End-of-Tape 
(EOT) marker. In this case, RO<63:60> is set to ‘O000’. 


¢ The specified number of bytes has been written and an End-of-Tape (EOT) marker was 
detected. In this case, RO<63:60> is set to ‘0100’. 


¢ The routine runs off the end of tape. In this case, RO<63:60> is set to ‘1001’. 


For disk devices, WRITE does not understand the file structure of the device. The routine 
writes the requested transfer byte count starting at the logical block number specified by R20. 
The transfer continues until either the specified number of bytes has been written or the last 
logical block on the device has been written. If the logical end of the device is encountered, 
then KO<63:62> is set to ‘Ui’. 


For network devices, WRITE appends any device-specific or protocol-specific headers. The 
routine transmits the specified requested transfer bytes with the proper network protocol over 
the appropriate network. The actual packet size is dependent on the device and protocol; the 
characteristics of the network device and protocol are specified at the time of the channel 
OPEN. 


The buffer pointed to by R19 should be mapped and write accessible by the kernel; the return 
address indicated by R26 should be mapped and executable by the kernel. 
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2.3.6 Console Environment Variable Routines 


System software accesses the environment variables indirectly through console callback rou- 
tines. These routines may be invoked while the operating system is fully functional as well as 
during operating system bootstrap or crash. The GET_ENV, SET_ENV, and RESET_ENV 
routines are subject to the constraints given in Section 2.3.1. These routines must: 


e Not alter the current IPL or current mode. 
These routines must be invoked in kernel mode. 
e Not alter the existing memory management policy. 
All internal pointers must be remapped by FIXUP. 
¢ = Not block interrupts. 
The operating system must be capable of continuing to receive hardware and software 
interrupts. 


The constraints on SAVE_ENV differ; see Section 2.3.6.3. 


The time necessary for these routines to complete is UNPREDICTABLE; however, a console 
implementation will attempt to minimize the time whenever possible. 


Software Note: 


Implementations must limit the execution time of these routines to significantly less than 
the interval clock interrupt period. 


The console implementation must ensure that any access to an environment variable is atomic. 
The console implementation must resolve multiple competing accesses by system software as 
well as competing accesses by system software and the console presentation layer. 


When invoking these routines, system software must be executing in kernel mode. If these rou- 
tines are invoked in other modes, their execution causes UNPREDICTABLE operation. 


These routines may be invoked on both the primary and secondary processors in a multiproces- 
sor configuration. It is recommended that system software serialize competing accesses to a 
given environment variable; a stale value may be returned if GET_ENV is invoked simulta- 
neously with SET_ENV or RESET_ENV. 
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2.3.6.1 GET_ENV — Get an Environment Variable 


Format: 


status 


Inputs: 
GET_ENV 
ID 


value 
length 


retadr 


Outputs: 


status 


= DISPATCH ( GET_ENV,ID,value,length ) 


= R16; 
= R17; 
=RI18; 
= R19; 
= R26; 


GET_ENV function code — 221¢ 


ID of environment variable 
Starting virtual address of buffer to contain returned value 
Number of bytes in buffer (unsigned) 


Return address 


Status: 
RO<63:61> ‘000’ Success 
‘001’ Success, byte stream truncated 
‘110’ Failure, variable not recognized 
RO<60:32> SBZ 
RO<31:0> Count of bytes returned (unsigned) 


GET_ENV causes the value of the environment variable specified by the ID in R17 to be 
returned in the byte stream specified by the virtual address in R18. The size in bytes of the 
input buffer is contained in R19. 


GET_ENV returns status in RO<63:61>. If the environment variable is recognized, RO<63:62> 
is set to ‘00’, its current value is copied into the byte stream, and RO<31:0> is set to the num- 
ber of bytes copied. If the value must be truncated, RO<61> is set to ‘1’. If the variable is not 
recognized, RO<63:61> is set to ‘110’ and RO<31:0> is set to ‘0’. 


The byte stream indicated by R18 should be mapped and write accessible by the kernel; the 
return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.6.2 RESET_ENV — Reset an Environment Variable 


Format: 
status = DISPATCH ( RESET_ENV,ID,value, length ) 
Inputs: 
RESET_ENV =RI16; - RESET_ENV function code — 214¢ 
ID =R17; ID of environment variable 
value = R18; Starting virtual address of byte stream to contain returned 
value 
length =R19; Number of bytes in buffer (unsigned) 
retadr = R26; Return address 
Outputs: 
status = RO; Status: 


RO<63:61> ‘000’ Success 

‘001’ Success, byte stream truncated 

‘100’ ‘Failure, variable read-only 

‘101’ Failure, variable read-only, byte 

stream truncated 

‘110’ Failure, variable not recognized 
RO<60:32> SBZ 
RO<31:0> Count of bytes returned (unsigned) 


RESET_ENV causes the environment variable specified by the ID in R17 to be reset to the 
system default value and that default value to be returned in the byte stream specified by the 
virtual address in R18. The size in bytes of the input buffer is contained in R19. 


RESET_ENV returns status in RO<63:61>. If the environment variable is successfully reset to 
the default value, RO<63:62> is set to ‘00’. If the variable is recognized but read-only, the 
value is unchanged and RO<63:62> is set to ‘10’. In both cases, the default value is copied into 
the byte stream and RO<31:0> is set to the number of bytes copied; if the value must be trun- 
cated, RO<61> is set to ‘1’. If the variable is not recognized, RO<63:61> is set to ‘110’ and 
RO<31:0> is set to ‘0’. 


The byte stream indicated by R18 should be mapped and write accessible by the kernel; the 
return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.6.3 SAVE _ENV - Save Current Environment Variables 


Format: 
status = DISPATCH (SAVE_ENV ) 
Inputs: 
SAVE_ENV =RI16; SAVE_ENV function code — 2346 
retadr = R26; Return address 
Outputs: 
status = RO; Status: 


RO<63:61> ‘000’ Success, all values saved 


‘001’ Success, some bytes saved, addi- 
tionai values to be saved 


‘110’ Failure, routine unsupported 


‘111’ Failure, error encountered saving 
values 


RO0<60:0> SBZ 


SAVE_ENV attempts to update the nonvolatile storage of those environment variables that 
must be retained across console initializations and system power transitions. 


Programming Note: 


For example, SAVE_ENV may cause an EEPROM to be updated. That update may write 
all "NV" environment variable values to the EEPROM, or may only write those variables 
that have been modified since the last update or console initialization. 


This routine is not subject to the constraints given in Section 2.3.6. The console may usurp 
operating system control of the system platform hardware, but must restore any such control 
or altered state before return. The console must not service any interrupts or 
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are otherwise intended for the operating system. 


The nonvolatile storage update may take significant time and multiple invocations of 
SAVE_ENV may be necessary. The time necessary for this routine to complete is UNPRE- 
DICTABLE. A console implementation will attempt to minimize the time whenever possible 
and must return in a timely fashion. The routine must return after partial operation completion 
if necessary. It is the responsibility of the console to ensure that subsequent calls make for- 
ward progress. The operating system may delay for extended periods between subsequent 
calls; the console must not rely on timely invocations of SAVE_ENV. 


impiementation Note: 
Implementations must limit the execution time of these routines to significantly less than 


the interval clock interrupt period. A return after partial operation completion is preferable 
to long latency. 
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SAVE_ENV returns status on the update in RO<63:61>. When the update has successfully 
completed and all relevant variables have been saved, the routine returns with RO<63:61> set 
to ‘000’. If SAVE_ENV returns after only a partial update to ensure timely response, 
RO0<63:61> set to ‘001’. If an unrecoverable error is encountered, the routine returns with 
RO<63:61> set to ‘111’. The contents of the nonvolatile storage are UNDEFINED. 


Implementation of SAVE_ENV is optional. If the console does not support SAVE_ENV, the 
routine returns with RO<63:61> set to ‘110’. 


On a multiprocessor system with an embedded console, the routine must be invoked on each 
processor in the configuration. See Section 3.8.1. 


It is recommended that system software ensure that calls to SET_ENV or RESET_ENV are 
not issued while an update operation is in progress on any processor. It is UNPREDICTABLE 
whether the updated environment value is saved. 


The return address indicated by R26 should be mapped and executable by the kernel. This rou- 
tine does not affect the current value of any environment variable maintained by the console. 
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2.3.6.4 SET_ENV — Set an Environment Variable 


Format: 
status = DISPATCH ( SET_ENV, ID, value, length ) 
Inputs: 
SET_ENV = R16; SET_ENV function code - 2016 
ID =RI7: | ID of environment variable 
value = R18; Starting virtual address of byte stream containing value 
length = R19; Number of bytes in buffer (unsigned) 
retadr = R26; Return address 
Outputs: 
status = RO; Status: 


RO<63:61> ‘000’ Success 
‘100’ ~—sC Failure, variable read-only 
‘110’ ‘Failure, variable not recognized 
‘111’ Failure, byte stream exceeds value 
length 
RO<60:31> SBZ 


RO<31:0> Maximum value length (unsigned) 


SET_ENV causes the environment variable specified by the ID in R17 to have the value speci- 
fied by the byte stream value pointed to by the virtual address in R18. The size in bytes of the 
input buffer is contained in R19. 


SET_ENV returns status in RO<63:61>. If the environment variable is s1 lly 
new value, RO<63:61> is set to ‘000’. If the variable is not recognize ah RO<63: 61> is ne to 
+ mn 
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110’. If the variable is read-only, the value is unchanged and RO<63:61> is sct to 


/ 


mn 
U Jif the 


0<63:61> is set 
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input buffer exceeds the maximum value length, the value is unchanged and 
to ‘111’. In all cases, the maximum value length is returned in RO<31:0>. 


The byte stream indicated by R18 should be mapped and read accessible by the kernel; the 
return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.7 Miscellaneous Routines 


2.3.7.1 FIXUP — Fixup Virtual Addresses in Console Routines 


Format: 


status = FIXUP ( NEW_BASE_VA, HWRPB_VA ) 


Inputs: 
NEW_BASE VA =RI16; 


New starting virtual address of the console callback routines 


HWRPB_VA =R17; New starting virtual address of the HWRPB 
retadr =R26; Return address 
Outputs: 
status = RO; Status: 
R0<63> ‘0’ Success 
oN Failure 


RO<62:0> SBZ 


FIXUP adjusts virtual address references in all other console callback routines using the new 
starting virtual address in R16, the new starting virtual address of the HWRPB in R17, and the 
current contents of the CRB. See Section 2.3.8.1.2 for a full description of FIXUP usage and 
functionality. 


If FIXUP is successful, it returns with RO<63> set to ‘0’. If FIXUP is not successful, console 
internal state has been compromised. The console attempts a cold bootstrap if the state transi- 
tion in Figure 3-1 indicates a bootstrap and the BOOT_RESET environment variable is set to 
"ON" (4E4Fj¢). Otherwise, the system remains in console I/O mode. 


This routine must be called in kernel mode and in the context of the existing memory map- 
ping; otherwise its execution causes UNPREDICTABLE or UNDEFINED operation. 


Software Note: 
FIXUP must be called while the original address space mapping is in effect. 


The return address indicated by R26 should be mapped and executable by the kernel. 
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2.3.7.2 PSWITCH — Switch Primary Processors 


Format: 
Status = DISPATCH ( PSWITCH, action ) 
Inputs: 
PSWITCH = R16; PSWITCH function code — 3016 
action =RI17; Action requests: | 
R17<63:2> SBZ 
R17<1:0> ‘Ol’ Transition from primary 
‘10’ _—‘ Transition to primary 
‘11’ = Switch primary 
cpu_id =R18; New primary CPU ID 
retadr = R26; Return address 
Outputs: 
status = RO; Status: 
RO<63> ‘0’ ~—- Success 
‘1’ Failure, operation not supported 


RO<62:0> Implementation-specific error status 


PSWITCH attempts to perform any implementation-specific functions necessary to support 
primary switching. R17 indicates the requested primary transition action. R18 contains the 
CPU ID (WHAMI IPR) of the new primary. 


PSWITCH is invoked by the old primary, the secondary that is to become the new primary, or 


both. See Section 3.5.6 for a full description of PSWITCH usage, functionality, and error 


returns. 


If PSWITCH is successful, it returns with RO<63> set to ‘0’. If PSWITCH is unsuccessful for 
any reason, it returns with RO<63> set to ‘1’ and implementation-specific status in RO<62:0>. 


PSWITCH is invoked at the highest IPL level or it produces UNDEFINED results. The return 
address indicated by R26 should be mapped and executable by the kernel. 
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2.3.7.3 BIOS_EMUL — Run BIOS Emulation Callback 


Format: 


status = DISPATCH (BIOS_EMUL, int86, input_flags, x86_regs, additional_data) 


Inputs 
func_code = R16; BIOS_EMUL function code — 3216 


int86 = R17; BIOS interrupt number (also called the BIOS service number) 
input_flags =R18; The following input flags: 
R18<63:5> SBZ 
R18<4> ‘Y Use data in R20 
0’ Ignore R20 


R18<3:1> Type of BIOS emulator to service the call: 
‘000’ ~=16-bit emulator type 
‘001’ =32-bit emulator type 
‘010’ 64-bit emulator type 
‘O11’ Reserved 
‘100’ Reserved 
‘101’ Reserved 
‘110’ Reserved 
‘111’ Reserved 

R18<0> Type of call: 
8 li Emulator type inquiry 
‘0’ Service 


x86_regs =R19; Virtual address of x86 register data block that represents the 
x86 register set for BIOS calls. 


Use the appropriate register structure for the type of BIOS 
emulator: 


16-bit emulator — Use register structure 1 (Figure 2-5) 
32-bit emulator — Use register structure 2 (Figure 2—5) 


64-bit emulator — Not defined for this version of the 
architecture 


Additional_data = R20; Virtual address of additional argument data. Specific to BIOS 
call 


Retaddr = R26; | Return address 


DIGITAL Restricted Distribution 


Console Interface to Operating System Software (III) 2-65 


Outputs: 
status = RO Status: 
If R18<0> = 0, RO has the following meaning: 


RO<63> ‘0’ Callback supported 

a Callback not supported 
RO<62> ‘0’ Emulator type supported 

i ig Emulator type not supported 
RO<61> ‘0’ Service number supported 

a Service number not supported 


RO<60:56> SBZ 


RO<55:0> Implementation-specific 


If R18<0> = 1, RO has the following meaning: 
RO<63> ‘0’ Callback supported 
‘VP Caliback not supported 


RO<62:59> SBZ 


RO<58:56> Return console’s emulator type: 


‘000’ No emulator in this console 

‘001’ 16-bit emulator in this console 
‘010’ 32-bit emulator in this console 
‘011’ 64-bit emulator in this console 


RO<55:0> SBZ 


The resulting x86 register state from the BIOS call is placed in the data block located at 
x86_regs (R19). Success or failure of the BIOS call is specific to the attempted call and the 
expected result in x86_regs. 


The BIOS_EMUL callback provides access to the BIOS emulator, allowing emulation of the 
x86 INT assembler instruction. 


The int86 value specifies the BIOS interrupt number to be emulated. A data block representing 
the x86 register set is used as input and is updated on return because operation of BIOS calls 
requires setting the x86 register set before the BIOS call and receiving data in them as the 
result of a BIOS call. 


Programming Notes: 


If a platform or pre-existing version of the firmware does not support BIOS_EMUL, 
RO<63> returns ‘1’ . 


The caller can determine the type of BIOS emulator in the console by setting R18<0> to ‘1’. 
BIOS_EMUL returns the type in RO<58:56>. 
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Because multiple BIOS emulators can be built into the console, use R18<3:1> to specify the 
type of BIOS emulator and register structure. If the console does not support a specified type, 
RO<63> and RO<62> return ‘1’. 


BIOS_EMUL supports only INT10 service calls, and for any other service number, RO<63> 
returns ‘1’. 


The caller should maintain the integrity of the register structure as input/output across multiple 
calls. The routine uses the register structure values as passed and returns the end values in the 
same structure. 

The return address indicated by R26 should be mapped and kernel-executable. 


Figure 2-5: BIOS Emulator Register Structures 


Register Structure 1 Register Structure 2 


31 2423 1615 8 7 0 31 2423 1615 8 7 0 





Background Notes on BIOS Emulation: 
e §6BIOS 


BIOS, or Basic Input Output System, is firmware that initializes the hardware and sets 
it to a known state or to a state that is chosen by the hardware vendor or the system 
user. The BIOS code performs a power-up self-test (POST), configures buses and 
devices, and provides an interface to boot the operating system. BIOS code can also 
provide a set of functions that allows other system software to program devices to a 
given mode or state. Those functions are device-dependent, but follow an industry 
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standard that is supported by most hardware vendors. Most BIOS code is written in 
the x86 assembly language. 


e BIOS Emulation 


To support standard BIOS firmware (x86-based) on Alpha-based platforms, the Alpha 
console has a built-in emulator that emulates the x86 instruction set. The emulator 
supports VGA BIOS functions and is limited to the less complex, INT10, VGA BIOS 
calls. The emulator supports a large number of third-party graphics cards on 
Alpha-based platforms. 


The emulator can be 16 bit or 32 bit. A 16-bit emulator limits its support to the 16-bit 
register and instruction sets. A 32-bit emulator supports the 32-bit register and 
instruction sets, as well as the 16-bit instruction set. 


e BIOS_EMUL Callback Routine 


The BiOS_EMUL caiiback routine provides a generic interface to the BIOS emulator. 
It provides a mechanism to request the console’s BIOS emulator type and returns 
appropriate status and error codes that indicate supported and unsupported arguments. 
Operating systems require this interface to support third-party graphics cards for 
different Alpha platforms. 


Commodity PC graphics cards (SVGA) rely heavily on the BIOS to set the graphics 
mode. Vendors generally do not document how to set a graphics mode by register 
programming (like 1280x1024), but instead refer to the BIOS INT10 call, which is 
used to set up the card. Without the interface provided by BIOS_EMUL, the operating 
system has no access to BIOS emulation, and the graphics cards must be programmed 
by specialized code in the driver. Further, BIOS_EMUL allows the operating system 
to maintain support for graphics cards when vendors release new versions, because the 
interface lets the operating system continue to correctly interact with any changed 
mode parameters. 
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2.3.8 Console Callback Routine Data Structures . 


The console and system software share two data structures that are necessary for the console 
callback routines: the Console Routine Block (CRB) and the Console Terminal Block (CTB) 
table. Both are located by offset fields in the HWRPB as shown in Figure 2-6. 


The CRB locates all addresses necessary for console callback routine function. The base physi- 
cal address of the CRB is obtained by adding the CRB OFFSET field at HWRPB[192] to the 
base physical address of the HWRPB. The CRB format is shown in Figure 2—7 and described 
in Table 2-11. 


The CTB table contains information necessary to describe the console terminal devices. The 
base physical address of the CTB table is obtained by adding the CTB TABLE OFFSET field 
at HWRPB[184] to the base physical address of the HWRPB. The CTB format is shown in 
Figure 2-8 and described in Table 2-12. 


Figure 2-6: Console Data Structure Linkage 
] :HWRPB 





[ ] 
[Offset to CTB] : 


[ ] 


[Offset to CRB] : ee eas 
[VA of DISPATCH Procedure Value] :CRB 
[PA of DISPATCH Procedure Value] 
[VA of FIXUP Procedure Value ] 
[Procedure Descriptor 1st Quadword] [PA of FIXUP Procedure Value 


] 
[VA of DISPATCH Entry] {Number of Entries in Map ] 
[Number of Pages in Map ] 
[Virtual/Physical Map ] 
{DISPATCH Procedure] 


2.3.8.1 Console Routine Block 


Before transferring control to system software, the console ensures that the console callback 
routines, console-private data structures, and associated local I/O space locations are mapped 
into region 0 of initial bootstrap address space. All necessary pages are located by the console 
routine block (CRB). 
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Figure 2—7: Console Routine Block 












63 
Virtual Address of DISPATCH Procedure Descriptor :CRB 
Physical Address of DISPATCH Procedure Descriptor :+08 








Virtual Address of FIXUP Procedure Descriptor :+16 
Physical Address of FIXUP Procedure Descriptor +24 
Number of Entries in the Virtual-Physical Map +32 


0 


:+40 


+48 
:+56 
+72 






Virtual Address for Entry Last 
Physical Address for Entry Last 
Page Count for Entry Last 






Table 2-11: CRB Fields 


Offset Description 


CRB DISPATCH VA — The virtual address of the OpenVMS Alpha procedure 
descriptor for the DISPATCH procedure. 

+08 DISPATCH PA — The physical address of the OpenVMS Alpha procedure 
descriptor for the DISPATCH procedure. 

+16 FIXUP VA — The virtual address of the OpenVMS Alpha procedure descriptor 
for the FIXUP procedure. 

+24 FIXUP PA — The physical address of the OpenVMS Alpha procedure descriptor 
for the FIXUP procedure. 

+32 ENTRIES — The number of entries in the virtual-physical map. Unsigned inte- 
ger. 

+40 PAGES — The total number of physical pages to be mapped. Unsigned integer. 
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Table 2-11: CRB Fields (Continued) 


Offset Description 


+48 ENTRY — Each entry identifies a collection of physically contiguous pages to be 
mapped. Each map entry consists of three quadwords: 
Offset Name Description 
+00 ENTRY_VA Base virtual address for entry 
+08 ENTRY_PA Base physical address for entry 
+16 ENTRY_PAGES Number of contiguous physical pages to be 


mapped. Unsigned integer. 


The CRB must be quadword aligned. The DISPATCH and FIXUP addresses must be quad- 
word aligned; all unused bits should be zero. The ENTRY addresses must be page aligned and 
all unused bits should be zero. 


The DISPATCH and FIXUP procedure descriptors located by DISPATCH_PA, 
DISPATCH_VA, FIXUP_PA and FIXUP_VA must be contained within the pages located by 
the first virtual-physical map entry. 


2.3.8.1.1 Console Routine Block Initialization 


Before transferring control to system software, the console initializes all fields of the CRB. 
The console fills in all physical and virtual address fields, the number of entries in the vir- 
tual-physical map (ENTRIES), the total number of pages to be mapped (PAGES), and the 
virtual addresses contained in the OpenVMS Alpha procedure descriptors for the DISPATCH 
and FIXUP procedures. ! PAGES is the sum of the contents of all ENTRY_PAGES fields. 


All addresses are initially mapped within region 0 of the initial bootstrap address space. These 
addresses include the contents of the CRB and all addresses contained within the DISPATCH 
and FIXUP procedure descriptors. The mapping must permit kernel access with appropriate 
read/write/execute access. The KRE, KWE, and FOx PTE fields are never subsequently 
altered by system software. The initial mapping need not be virtually contiguous. 


2.3.8.1.2 Console Routine Remapping 


When the console transfers control to the system software, the console callback routines may 
be invoked by the system software without additional setup. All necessary virtual mappings 
into initial bootstrap address space must be performed by the console before transferring 
control. 


The system software may virtually remap the console callback routines. Remapping permits 
the system software to relocate the routines to virtual addresses other than those assigned in 
initial bootstrap address space. Relocation requires that the console adjust (or fix up) various 
internal virtual address references. 


1 The OpenVMS Alpha calling standard specifies that the second quadword of a procedure descriptor 
contains the entry address (virtual) of the procedure itself. 
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The system software invokes the FIXUP routine to enable the console to perform the neces- 
sary internal relocations. The FIXUP routine virtually relocates all console routines and 
adjusts any console-private virtual address pointers such as those used to locate a local I/O 
device or HWRPB data structure. If system software virtually remaps the HWRPB, FIXUP 


must be 


invoked before calling any other console callback routine; it is recommended that sys- 


tem software remap both the HWRPB and the console routines together. Calling the console 
callback routines after the HWRPB has been remapped from its original bootstrap address 


location 


results in UNDEFINED operation of the system. 


To remap the console callback routines, the system software and the console cooperate as 


follows: 


Ky 


2-72 Console 


System software must be executing on the primary processor in a multiprocessor sys- 
tem. 


System software determines the new base virtual address of the HWRPB;; this remap- 
ping is optional. System software does not perform any remapping of the HWRPB at 
this step. 


System software need not remap the memory data descriptor table located by 
HWRPB[200]. See Section 2.1 for a description of the HWRPB and its size. 


System software determines the new base virtual address of the console callback rou- 
tines. The CRB entries will be mapped into a set of virtually contiguous pages. The 
CRB PAGES field (CRB[40]) is used to determine the number of pages that must be 
mapped. System software does not perform any remapping of the console callback rou- 
tines at this step. 


System software passes control to the console by calling FIXUP (NEW_BASE_VA, 
NEW_HWRPB_ VA), initiating the remapping. NEW_BASE_VA is the new base vir- 
tual address as established in step 3. HWRPB_VA is the new starting virtual address of 
the HWRPB as established in step 2. The remapping process is only initiated at this 
step; do not attempt to access the HWRPB or CRB using the new VAs. - 


The console first locates the HWRPB, then locates the CRB using the CRB OFFSET 
field. The console then locates all internal pointers and adjusts them. All linkage sec- 
tions and other console-internal pointers must be modified. These data structures can be 
located during FIXUP because the initial bootstrap address space mapping is in effect; 
any console-internal pointers are valid until modified. 


System software need not remap the optional CONFIG block or FRU table located by 
HWRPB OFFSET fields. If these blocks will subsequently be used by the console, 
they must be located by console-internal pointers and those pointers must be modified 
during FIXUP. 


DISPATCH and FIXUP are not uniquely remapped by the system software. The 
FIXUP must update the DISPATCH and FIXUP procedure descriptors located by 
CRB[8] and CRB[24]. The physical pages containing the procedure descriptors and 
the routines themselves must be included in the virtual-physical map. 


The rejative virtual address offsets of the pages located by the entry map are not 
guaranteed to be retained across the FIXUP. The initial bootstrap address mapping of 
the physical pages located by the entry map is not required to be virtually contiguous. 
The system software remapping is required to be virtually contiguous. Any offsets that 
cross physical pages may have to be modified by FIXUP. 
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6. The console returns from FIXUP. If the FIXUP was not successful, console internal 
state has been compromised. The console attempts a cold bootstrap if the state transi- 
tion in Figure 3—1 indicates a bootstrap and the BOOT_RESET environment variable is 
set to "ON" (4E4F) ¢). Otherwise, the system remains in console I/O mode. 


7. System software updates each virtual-physical map entry of the CRB: 


A. The PTE and TB entries that correspond to the range of old virtual address are 
invalidated using the old ENTRY_VA and ENTRY_PAGES values. 


B. The new starting virtual address is written into the ENTRY_VA. This virtual 
address is computed by adding the NEW_BASE_VA to the sum of the 
PAGE_COUNTSs of each preceding entry. 


C. New PTEs are constructed for each physical page. The new PTE FOx and protec- 
tion fields are copied from the original bootstrap address PTE. 


Programming Note: 

It is the responsibility of the console to judiciously set both the protection and 
FOx bits in the bootstrap address PTE. In particular, if the console sets the FOE 
bit, there is no architectural guarantee that the console exception handler will gain 
control as a result, nor is there any obvious appropriate response for the operating 
system handler. 


8. System software updates the DISPATCH and FIXUP VA’s. The first virtual-physical 
map entry locates the physical page that contains the DISPATCH and FIXUP procedure 
descriptors. 


9. System software updates all PTEs and invalidates all appropriate TB entries associated 
with the remapped HWRPB and any remapped OFFSET blocks. 


At the completion of this process, the console callback routines are remapped and may again 
be used by system software. Since FIXUP itself is relocated, system software may remap the 
routines more than once. 


2.3.8.2 Console Terminal Block Table 


The Console Terminal Block (CTB) table indicates the current identity and characteristics of 
each console terminal device. The CTB table is the only data structure shared by the console 
and system software that describes the terminal devices accessible by console callback 
routines. 


The CTB table contains an array of CTBs. Each CTB is a quadword-aligned structure with for- 
mat as shown in Figure 2—8 and described in Table 2-12. The index of the CTB in the CTB 
table is the unit number of the terminal device. The CTB format consists of two parts: a header 
and a device-specific segment. The format of the header is common to all CTBs; the format of 
the device-specific segment is dependent on the unique device type. 


There is only one console terminal. The console terminal unit is selected by the console presen- 
tation layer before bootstrapping the operating system. See Section 1.3. Once the operating 
system is bootstrapped, the console terminal unit should not be changed by the console presen- 
tation layer. Any attempt to do so results in UNDEFINED operation of the console. 
Specifically, if the console presentation layer halts the operating system, alters the console ter-. 
minal unit, then restarts or continues operating system execution, the operation of the console 
is UNDEFINED. The console terminal unit is identified by the TTY_DEV environment 
variable. 
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During console initialization, the console: 
1. Locates all console terminal devices. 
Selects the console terminal. 
Builds a CTB for each. 
Initializes the CTB OFFSET field of the HWRPB. 
Initializes each console terminal device. 


Records the default state of each console terminal device in its CTB. 


aU Ge i: eae 


Records the unit number of the console terminal in the TTY_DEV environment vari- 
able. 


Whenever the console changes the state of a console terminal device, the console must update 
its CTB to reflect the change. The console may record extended status on character transfers 
(GETC/PUTS) in the CTB. 


System software uses the CTB to determine console terminal device characteristics. System 
software never directly modifies the contents of a CTB; such modifications can result in 
UNDEFINED operation of the console terminal device either as the result of a subsequent call 
to a console terminal routine or as the result of a console internal need to access a console ter- 
minal device (for example, as the result of a halt). System software calls the 
SET_TERM_CTL console terminal routine to change console terminal device characteristics. 


Figure 2-8: Console Terminal Block 


63 31 0 
Device Type 







:CTB 
+32 
me Device-Specific Data Segment 1 
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Table 2-12: CTB Fields 


Offset Description 


CTB DEVICE TYPE — Console terminal device type and format of the device-spe- 
cific segment. Defined device types are: 


Type Description 


0 No console present 

1 Detached service processor 

2 Serial line UART 

5 Graphics display with LK keyboard connected to serial line UART 
4 Multipurpose 

Other Reserved 


+08 DEVICE ID — The physical device and channel that sends and receives the con- 
sole terminal stream. This field is necessary for configurations that include multi- 
ple-channel devices or multiple single-channel devices. The field has two 
subfields: 


Bits Description 


<63:32> Device index 
<31:0> Channel index 


For implementations that support only a single directly connected console termi- 
nal device, this field is set to zero. The device ID is not necessarily related to the 
console terminal device unit number. 


+16 RESERVED — This field is reserved for future expansion and may not be used 
by the console or system software. 


+24 DSD LENGTH — This field specifies the number of bytes in the device-specific 
data field, DSD. 


+32 DSD — This field contains device-specific data associated with the unique con- 
sole terminal type. Device-specific data may include such parameters as baud rate, 
flow control enable, and the current state of the CAPS LOCK key. The DSD field 
should contain only those items that must be shared between the console and sys- 
tem software. 


2.4 Interprocessor Console Communications 


This section considers only those communications between a running processor and a console 
processor. Communications paths between running processors are external to the console. 
Communications paths between console processors are internal to the console. 


Commands are transmitted from a running primary to a console secondary; messages (and 
requests) are transmitted from a console secondary to a running primary. Commands and mes- 
sages are passed via receive (RX) and transmit (TX) buffers contained in each per-CPU slot of 
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the HWRPB. The use of these buffers is controlled by the Receive Buffer Ready (RXRDY) 
and Transmit Buffer Ready (TXRDY) flags. 


The transmit and receive buffers are named from the point of view of the console secondary. 
The console secondary receives commands in the RX buffer and transmits messages in the TX 
buffer. 


2.4.1 Interprocessor Console Communications Flags 


The Receive Buffer Ready (RXRDY) and Transmit Buffer Ready (TXRDY) flags are used to 
control the interprocessor console communications. The RXRDY and TXRDY flags are gath- 
ered into bitmasks in the HWRPB at HWRPB[296] and HWRPB[304], respectively. The 
TXRDY bitmask allows a running primary to quickly determine which, if any, of the console 
secondaries are trying to send messages. 


The running primary sets the appropriate RXRDY flag to indicate to the receiving console sec- 
ondary that a command is contained in the secondary’s RX buffer. The secondary is assumed 


~rwnre«try 


mand has been read from the RX buffer and before executing the command. 


A console secondary sets its TXRDY flag to indicate to the running primary that a message is 
contained in the secondary’s TX buffer. The console generates an interprocessor interrupt to 
the primary to notify it that a message is ready. System software clears the TXRDY flag after 
the message has been read from the TX buffer and before processing the message. 


Implementation Note: 


The TXRDY bitmask minimizes interprocessor interrupt service overhead by reducing the 
number of required memory lookups. 


2.4.2 Interprocessor Console Communications Buffer Area 


Each per-CPU slot of the HWRPB includes an RXTX Buffer Area that provides the communi- 
cations path between processors. The buffer area is controlled by the RKRDY and TXRDY 
flags. The format is shown in Figure 2—9 and described in Table 2—13. 


Figure 2-9: Inter-Console Communications Buffer 













63 32 31 0 

TXLEN RXLEN :SLOT+296 

Rx Buffer , 
80, , Bytes :SLOT+304 

Tx Buffer 

:‘SLOT+384 

T | 80,, Bytes Y 
‘SLOT+464 
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Table 2-13: Inter-Console Communications Buffer Fields 


Offset Description 


SLOT+296 RXLEN — If the bit corresponding to this processor is set in the RXRDY bit- 


+300 


+304 


+384 


mask at HWRPB[296], the RXLEN field contains the length in bytes of the 
command in the RX buffer. 


TXLEN — If the bit corresponding to this processor is set in the TXRDY bit- 
mask at HWRPB[304], the TXLEN field contains the length in bytes of the 
message in the TX buffer. 


RX BUFFER — Buffer used by this console secondary to receive a command 
from the running primary. Only command data is passed through this buffer; a 
console secondary does not receive messages from the running primary. Com- 
mands must end with "<CR><LF>" (OA0Dj¢). 


TX BUFFER — Buffer used by this console secondary to transmit a message 
to the running primary. Only message data is passed through this buffer; a 
console secondary does not send commands to the running primary. Messages 
must end with with the console secondary’s prompt, "<CR><LF>Pnn>>>" 
(3E3E 3Enn nn50 OAODj¢). 


2.4.3 Sending a Command to a Secondary 


The running primary manipulates the secondary’s RXRDY flag and RX buffer in the follow- 
ing manner to send a command to a console secondary. 


Programming Note: 


The RXRDY flag is a software lock variable; the primary and the secondary must use 
LDQ_L/STQ_C instructions to set and clear bit n. See Common Architecture (I), Chapter 


oF 


In the following sequence, the console secondary is assumed to have CPU ID = n. 


1. 
2: 


The primary examines bit n of the RXRDY bitmask. If the bit is clear, proceed to step 3. 


The primary polls bit n of the RXRDY bitmask until clear or until some timeout is 
reached. If a timeout occurs, system software reports an error and takes appropriate 
action. 


The primary moves the text of the desired console command into the RX buffer in the 
secondary’s HWRPB slot (the nth per-CPU slot). 


The primary sets the length of the command into the RXLEN field in the secondary’s 
HWRPB slot (the nth per-CPU slot). — 


The primary sets bit n of the RXRDY bitmask to indicate there is a command waiting. 
The secondary is assumed to be polling bit n of the RXRDY bitmask. 


When the secondary notices that bit n of the RXRDY bitmask is set, it removes the 
command from its RX buffer. 
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8. 


9. 


The secondary clears bit n of the RXRDY bitmask, indicating that its RX buffer is again 
available. 


The secondary attempts to process the command. 


2.4.3.1 Sending a Message to the Primary 


The console secondary manipulates its TXRDY flag and TX buffer in the following manner to ~ 
return a message to the running primary. 


Programming Note: 


The TXRDY flag is a software lock variable; the primary and the secondary must use 
LDQ_L/STQ_C instructions to set and clear bit n. See Common Architecture (I), Chapter 


ay 


Again, the console secondary is assumed to have CPU ID = n. 


1. 


10. 


11. 


The secondary examines bit n of the TXRDY bitmask. If the bit is clear, proceed to step 
So: 


The secondary polls this bit until it clears or until a long timeout occurs. (See step 7.) - 


The secondary moves the text of its response message into the TX buffer in the second- 
ary’s HWRPB slot (the nth per-CPU slot). 


The secondary sets the length of the message into the TXLEN field in the secondary’s 
HWRPB slot (the nth per-CPU slot). 


The secondary sets bit n of the TXRDY bitmask to indicate there is a message waiting. 


The secondary issues an interprocessor interrupt to the primary. This is always done; 
the primary need not poll for bits in the TXRDY bitmask. 


The secondary polls the TXRDY bitmask until bit n clears or until a long timeout 
expires. This prevents the secondary from performing any action that might cause the 
message to be lost before the primary can process it. 


Programming Note: 

The secondary may be restarted once it has transmitted the error halt message to 
the primary. However, it must wait for the primary to have a reasonable chance to 
respond to the interprocessor interrupt and process the message before the restart 
proceeds, because that message is important visible evidence of the error halt 
condition. On the other hand, the secondary should not wait too long for the 
primary to respond because the primary may be affected by the same condition 
that caused the secondary to error halt. Hence, the need for a timeout that is of 
reasonable length. 


As a result of the interprocessor interrupt, the primary eventually checks for console 
messages by examining the TXRDY bitmask. The primary notices that bit n of the 
TXRDY bitmask is set. 


The primary removes the message from the TX buffer. 


The primary clears bit n of the TXRDY bitmask, indicating that the TX buffer is again 
available. 


The primary attempts to process the message. 
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2.5 \Revision History 


Revision 7.0 November 10, 1997 


1. 


2 
2 
4, 
> 
6 


Added ECO 99, extended VA address 

Added ECO 105, definition of PALcode available field 
Added ECO 107, BIOS_EMUL callback routine 

DEC OSF/1 ——> DIGITAL UNIX 

OpenVMS AXP ——> OpenVMS Alpha 

Removed AXP 


Revision 6.0, December 12, 1994 


if 


me ee lS i ee ee “ee 


— eet 
Wo NY - © 


Removed "two’s complement" in checksum text 

ULTRIX ——> DEC OSF/1 

Alpha ——> Alpha AXP 

Add ECO 56, DSRDB in HWRPB 

Add ECO 53, reverse major/minor field description 

Added reference to illustration [V-2-3 

Add ECO 39, PALcode switching 

Redefine FIXUP description in 2.3.8.1.2 from A_SRM note 173.1 
ISO-LATIN — 1 toISOLatin-1 


. Corrected fig 2-1, HWRPB Overview, for TRB table, CONFIG table, FRU table 
. Corrected fig 2-2 for CONFIG table, FRU table 

. Removed R25, R27 from console callbacks 

. Commented out the Implementation Considerations 

14. 


Made software note for execution time of variable routines generic 


Revision 5.0, May 12, 1992 


1% 


2 
3 
4, 
5 


Integrated ECO 30 

Widget ——> Device or Controller, as appropriate 
VMS ——> OpenVMS 

Converted appropriate internal text to various ‘notes’ 
Convert to SDML 


Revision 4.1, August 12, 1991 


1. 
2s 
3. 
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Replace previous Console Chapter with Console ECO 15 
Includes 3 chapters and two appendices, renumber I/O Chapter 


Material substantially changed or rearranged\ 


Console Interface to Operating System Software (III) 2-79 


Chapter 3 


System Bootstrapping (III) 


This chapter describes the net effects of the action of the console to control the system plat- 
form hardware. The major system state transitions and the role of the console in controlling 
those transitions are described in Section 3.1.1. When power is applied to an Alpha system, 
the console initializes the system as explained in Section 3.2. The console actions necessary to 
bootstrap system software include processor initialization (Section 3.4.1.5), memory sizing 
and testing (Section 3.4.1.1), building an initial virtual address space (Section 3.4.1.2), and 
loading the bootstrap (Section 3.6). The console actions to restart system software are 
described in Section 3.5. 


3.1 Processor States and Modes 


3.1.1 States and State Transitions 


An Alpha processor can be in one of five major states: 
1. Powered off — no system power supplied to the processor 
2. Halted — operating system software execution suspended 
3. Bootstrapping — attempting to load and start the operating system software 
4. Restarting — attempting to restart the operating system software 
5. Running — operating system software functioning 


As shown in Figure 3-1, the transitions between the major states are determined by the current 
state and by a number of variables and events, including: 


e Whether power is available to the system 


e The console AUTO_ACTION environment variable, which specifies a "Halt action" 
(see CALL_PAL HALT) . 


¢ ©The console lock setting 

¢ The Bootstrap—in—Progress (BIP) flags 
e The Restart—Capable (RC) flags 

¢ Processor error halts 

e The CALL_PAL HALT instruction 


© Console commands 
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Figure 3-1: Major State Transitions 


Action Causing . Initial State 

Transition to 

Final State Off Halted Booting Restart Running 
Powerfail Off Off 
A and Power Restored Halted 
B and Power Restored Booting 
C and Power Restored Restart 
BOOT and Console Is Locked Booting 
START or CONTINUE (and) Running 

Console Is Unlocked 
Final 


Bootstrap Fails or D Halted State 
Bootstrap Succeeds Running 


Br. Halted 
Restart Fails Booting 
Restart Succeeds Running 


A and Processor Halts or D Halted 
B and Processor Halts Booting 
C and Processor Halts Restart 





Key to Figure 3-1 

A Console is unlocked and AUTO_ACTION is "HALT". 

B Console is unlocked and AUTO_ACTION is "BOOT". 

C Console is unlocked and AUTO_ACTION is "RESTART" or console is locked. 
D 


Console is unlocked, the processor is forced into console I/O mode. 


To effect major state transitions, the console obeys these rules: 


e If the console is unlocked when power is restored or when the processor halts, enter the 


state selected by the console AUTO_ACTION environment variable. 


e If the console is locked when power is restored or when the processor halts, attempt a 
processor restart. 


¢ When processor restart fails, attempt a bootstrap of that processor. One cause of a failed 
restart is the processor’s RC flag being clear when the console attempts the restart. 


e¢ When system bootstrap fails, halt. One cause of a failed bootstrap is the processor’s BIP 
flag being set before the console attempting the bootstrap. Only the processor that failed 
bootstrap will halt. 


e When system bootstrap or processor restart succeeds, the processor starts running. 


¢ When the primary processor is halted and the console is unlocked, the console BOOT 
command causes a system bootstrap. 


e When a secondary processor is halted and the console is unlocked, the console START 
—CPU command causes the console to attempt to start that processor running. 
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When a processor is halted and the console is unlocked, the console CONTINUE com- 
mand causes the processor to continue running as though no halt was incurred. 


If the console is unlocked and a specified processor is running or booting or restarting, 
that processor is halted by a console HALT —CPU command. 


Implementation Note: . 
In an embedded console implementation, the primary processor must be forced 
into the console I/O mode before issuing the HALT —CPU command. 


3.1.2 Major Modes 


In addition to the major states, the console and processor are described as being in one of three 


modes: 


1. 


Program I/O mode 


The processor is running. The processor interprets instructions, services interrupts and 
exceptions, and initiates I/O operations under the control of the operating system. 


Console I/O mode 


The processor is halted or bootstrapping or restarting. The console provides control 
over the system; the operating system has either relinquished control or has yet to gain 
control. The operating system does not service interrupts or exceptions or initiate I/O 
operations. The actions of the console are determined by internal console state and 
commands from the console operator. 


‘Console Initialization mode 


The console has yet to acquire control of the processor. The console itself may also 
require initialization, such as when power is first applied to the system. 


A given processor may be in one of four modes: 


Primary processor in program I/O mode or "running primary" 
Primary processor in console I/O mode or "console primary" 
Secondary processor in program I/O mode or "running secondary" 


Secondary processor in console I/O mode or "console secondary" 


As noted in Section 1.1, implementations must include a mechanism to force a processor exe- 
cuting in program I/O mode into console I/O mode. 


3.2 System Initialization 


An Alpha system must be initialized when power is restored. System initialization also occurs 
as the result of a system bootstrap when the BOOT_RESET environment variable is set to 
"ON", or as the result of the console INITIALIZE command. Initialization involves all imple- 
mentation-specific, system-wide actions necessary to allow the system to boot system 
software on the primary processor. Table 3—1 summarizes the effects of initialization as seen 
by system software. 
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Initialization may include initialization of the console itself. During console initialization, the 
console must build the HWRPB and all associated data structures necessary to permit the con- 
sole to accept console commands and boot system software. 


System initialization may also include any necessary system bus, processor, or I/O device ini- 
tialization. The initialization of a processor performed as part of system initialization is not 
necessarily that performed just before transfer of control to the operating system bootstrap. 
See Section 3.4.1.5 for a description of processor initialization as seen by system software. 


Table 3-1: Effects of Power-Up Initialization 


Processor State Initialized State 
BIP and RC flags Cleared 
Reason for halt code ‘0’ (bootstrap) 


Integer and floating-point registers UNPREDICTABLE 


System memory Unaffected if preserved by battery backup; otherwise, 
UNPREDICTABLE 

Environment variables Unaffected if nonvolatile; otherwise, set to default 

BB_WATCH Unaffected 


I/O device registers UNPREDICTABLE 


3.3 PALcode Loading and Switching | 


3.3.1 PALcode Loading 


The console loads PALcode into good memory within a memory cluster that is not available to 
system software. If PALcode scratch space is required, the console allocates good memory 
within a memory cluster that is not available to system software. PALcode memory and 
scratch space are at least page aligned. The console records the starting physical address and 
length of PALcode memory and scratch space and then sets the PALcode Memory Valid 
(PMV) flag in the per-CPU slot of the primary processor. The PMV flag indicates that the 
PALcode descriptors are valid. 


After PALcode loading and initialization, the console sets the PALcode Loaded (PL) and PAL- 
code Valid (PV) flags in the primary’s per-CPU slot. The PL flag indicates that PALcode has 
been loaded; the PV flag indicates that any necessary PALcode initialization has been 
performed. 
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PALcode loading and initialization are implementation specific. The PALcode source may be 
a special console device, ROM, a system device, a communications line, or any other imple- 
mentation-specific source. The state of the console and system must be such that the source is 
accessible. The console determines the PALcode variant in an implementation-specific fash- 
ion; console implementations that are dependent on a given variant load that variant. Console 
and platform implementations may select any PALcode variant and may load multiple PAL- 
code variants. 


Note: 
DIGITAL UNIX supports PALcode switching but does not support PALcode loading. 
Any platform that supports DIGITAL UNIX must either use the DIGITAL UNIX variant 
as the default or must load (but need not switch to) the DIGITAL UNIX variant before 
system bootstrap. 


The means by which any PALcode internal state is initialized is implementation specific. 


3.3.2 PALcode Switching 


PALcode switching is accomplished when one ("current") PALcode transfers control to 
another ("new") PALcode. PALcode switching can be initiated by the console or the operating 
system software. 


Note: 
OpenVMS Alpha does not support PALcode switching. Any platform that supports 


OpenVMS Alpha must either use the OpenVMS Alpha variant as the default or must 
switch to the OpenVMS Alpha variant before system bootstrap. 


PALcode switching is performed by PALcode without intervention from the console or operat- 
ing system software. The current PALcode must be able to locate the new PALcode image. 
The new PALcode may perform minimal sanity checks. 


To support PALcode switching, all PALcode images must implement a PALcode switching 
entry point at the image base (offset 0). During PALcode switching, the new PALcode image 
receives control from the current PALcode image at this offset. 


For the purposes of switching, a PALcode image is identified by one of the following: 
e PALcode variant 


PALcode variants are in the range 0 < variant < 256 and permit switching between 
cooperating, previously loaded PALcode images. PALcode variants are interpreted by 
the current PALcode without assistance from the console or operating system. 


e =6The physical address of the switching entry point. 


Entry point addresses are used whenever the operating system or console must load a 
PALcode image. Entry point addresses must meet the alignment requirements of the 
processor implementation and may occupy the lowest memory page. 


System software initiates PALcode switching during system bootstrap whenever the variant 
required is not identical to that supplied by the console. Once a new variant has been estab- 
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lished by system software, the console must restore that variant across all subsequent 
transitions from console I/O mode to program I/O mode. The console must ensure that the sys- 
tem software PALcode variant appears unchanged when: 


1. 
2: 
3. 


4. 


A processor is restarted. 
A secondary processor is started. 


The operator forces a processor into console I/O mode, then continues program execu- 
tion (HALT followed by CONTINUE). 


System software invokes a callback routine that requires transition to console I/O mode. 


System software is never required to restore a PALcode variant. The console may switch PAL- 
code at entries to console I/O mode, but must restore the variant established by system 
software at subsequent re-entry to program I/O mode. 


\To register anew PALcode variant, contact the Alpha architecture group at TAL- 
LIS::ALPHA_SRM or ALPHA_SRM @TALLIS.ENET.DEC.COM.\ 


3.3.2.1 PALcode Switching Procedure 


PALcode switching proceeds as follows: 


1. 


\o 


The current PALcode is entered by the CALL_PAL SWPPAL instruction. The PALcode 
image identifier (variant or switching entry point address) is contained in R16. Regis- 
ters R17 through R21 contain parameters that are passed without change to the new 
PALcode image. The interpretation of R17 through R21 is specific to the new PALcode 
image. 


If the current PALcode is not supplied by DIGITAL and does not support PALcode 
switching, the current PALcode sets RO = 1 and returns from the CALL_PAL SWPPAL 
instruction . 


The current PALcode determines if R16 contains a PALcode variant or switching entry 
point address. If the latter, execution continues at step 7. 


The current PALcode validates the PALcode variant. If unsuecesstuls the operation 
fails, the current PALcode sets RO = 1 and returns from the CALL_PAL SWPPAL 
instruction. 


The current PALcode determines if the PALcode associated with the PALcode token 
has been loaded. If not, the operation fails, the current PALcode sets RO = 2 and returns 
from the CALL_PAL SWPPAL instruction. 


The current PALcode determines the base physical address associated with the PAL- 
code token. 


The current PALcode branches to the new PALcode image at the switching entry point 
(physical) address determined in step 3 or 6. 


The new PALcode performs any necessary implementation-specific PALcode initializa- 
tion. 
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10. The new PALcode performs any implementation-specific actions using the entry 
parameters contained in R17 through R21. The resulting changes in processor state are 
summarized for each PALcode variant in Section 3.3.2.3. 


11. The new PALcode clears RO and passes control to the code thread determined by the 
entry parameters. Control is always passed in kernel mode with interrupts disabled or 
blocked. 


If a hardware failure occurs when accessing any of the addresses specified by the calling argu- 
ments or other dependent locations, a hardware reset and system initialization are performed. 


Implemention Note: 


A common implementation is that the switching entry point is identical to the hardware 
reset entry. PALcode must distinguish the two cases. In the case of hardware reset, 
PALcode must perform any necessary hardware initialization and pass control to the 
console. In the case of switching, PALcode must pass control to the code thread 
determined by the entry parameters. 


Notes: 


e System software must update the PALcode revision field (SLOT[168]) after PALcode 
‘switching. The console uses that field to determine if PALcode must be switched (to the 
system software-specific image) before passing control on system restarts. 


Similarly, system software may need to update the PALcode revision field in the 
per-CPU slot (SLOT[168]) of each secondary processor before starting the secondary. 
There is only one system restart routine. The console uses the PALcode revision field 
to determine if PALcode must be switched (to the system software-specific image) 
before passing control on secondary processor starts. 


e PALcode switching is initiated by invoking the CALL_PAL SWPPAL instruction. 
before invoking SWPPAL, the caller should ensure that the system is quiescent. It is 
recommended that SWPPAL be invoked with interrupts either disabled or blocked. 
After a successful PALcode switch, the operating system may need to update the VPTB 
field in the HWRPB or restart HWPCB in each per-CPU slot. 


e PALcode switching does not implicitly load PALcode. During system bootstrap, the 
operating system must ensure that the desired PALcode variant is loaded. If loading is 
required, the operating system must allocate sufficient physically contiguous physical 
memory for the new PALcode image and any additional PALcode scratch space, then 
load the PALcode image in an implementation-specific manner. 


e After a PALcode switch, the operating system may need to invoke the FIXUP console 
callback routine. FIXUP must be invoked after any operation that affects virtual address 
translation and before subsequent invocations of other callback routines. See Section 
pie ee pe 
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3.3.2.2 Specific PALcode Switching Implementation Information 


OpenVMS Alpha does not currently support PALcode switching. DIGITAL UNIX supports 
PALcode switching as shown in Table 3-2. 


Table 3-2: DIGITAL UNIX PALcode Switching 


Register CALL_PAL swppal Parameter Usage 
R17 (al) New PC 

R18 (a2) New PCBB 

R19 (a3) New VPTB 


3.3.2.3 Processor State at Exit for DIGITAL UNIX from PALcode Switching Instruction 


Tabie 3-3: Processor State for DIGITAL UNIX at Exit from swppal 


Processor State At Exit from swppal: 
ASN Address space number ASN in PCB passed to swppal 

_ FEN Floating enable FEN in PCB passed to swppal 
IPL Interrupt priority level 7 
MCES Machine check errorsummary Zero | 
PCBB Privileged context block Address of PCB passed to swppal 
PC Program counter PC passed to swppal 
PS Processor status IPL=7, CM=K 
PTBR Page table base register PTBR in PCB passed to swppal 
Unique Processor unique value Unique in PCB passed to swppal 
WHAMI Who-Am-! Unchanged 
Sysvalue System value | Unchanged 
KSP Kernel stack pointer KSP in PCB passed to swppal 
Other IPRs UNPREDICTABLE 
RO Zero 
Integer and floating-point registers UNPREDICTABLE, except SP and RO 


DIGITAL Restricted Distribution 


3-8 Console Interface Architecture (Ill) 


3.4 System Bootstrapping 


This section describes the operations performed by the Alpha console to locate, load, and trans- 
fer control to a primary bootstrap. The responsibilities of the console and the initial state seen 
by system software are presented for multiprocessor and uniprocessor environments. The 
actions of the console for cold bootstrap (full hardware initialization) and warm bootstrap (par- 
tial hardware initialization) are described. 


A system bootstrap can occur as the result of a powerfail recovery, a processor halt, or an INI- 
TIALIZE or BOOT console command. See Section 3.1.1 for a complete description of these 
state transitions. 


3.4.1 Cold Bootstrapping in a Uniprocessor Environment 
This section describes a cold bootstrap in a uniprocessor environment. A system bootstrap is a 
cold bootstrap when any of the following occur: 
¢ Power is first applied to the system. 
¢ The bootstrap is requested by system software. 


© A console INITIALIZE command is issued and the AUTO_ACTION environment 
variable is set to "BOOT". 


¢ The BOOT_RESET environment variable is set to "ON". 
The console must perform the following steps in the cold bootstrap sequence. 
Perform a system initialization 
Size memory 


Test sufficient memory for bootstrapping 


Se 


Determine whether to configure a standard three-level page table or an extended 
four-level page table structure 


Load PALcode 
Build a valid Hardware Restart Parameter Block (HWRPB) 
Build a valid Memory Data Descriptor table in the HWRPB 


Initialize bootstrap page tables and map initial regions 


oO wm ND 


Locate and load the system software primary bootstrap image 
10. Initialize processor state on all processors 


11. Transfer control to the system software primary bootstrap image 


The steps leading up to the transfer of control to system software may be performed in any 
order. The final state seen by system software is defined, but the implementation-specific 
sequence of these steps is not. Before beginning a bootstrap, the console must clear any inter- 
nally pended restarts to any processor. 
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3.4.1.1 Page Table Structure Configuration 


An Alpha implementation must support a mode of operation whereby the virtual address for- 
mat contains three levels of page table fields, plus a byte-within-page field. 


An Alpha implementation additionally may support an extended mode of operation whereby 
the virtual address format contains four levels of page table fields, plus a byte-within-page 
field. 


The console must support a mechanism that lets the user specify the desired mode of opera- 
tion. The console then configures the system accordingly, as it prepares the bootstrap 
environment for software. This mechanism must be persistent, such that once specified, the 

_ user’s choice remains in force for all subsequent bootstraps or until the user chooses a differ- 
ent mode of operation. 


3.4.1.2 Memory Sizing and Testing 


Memory sizing is the responsibility of the console. The console must also test sufficient mem- 
ory to permit control to be passed to the primary bootstrap image. ‘he results of console 
memory sizing and testing are passed to system software in the Memory Data Descriptor 
(MEMDSC) table located by HWRPB[200]. 


The MEMDSC table contains one or more memory cluster descriptors. Each memory cluster 
descriptor describes a physically contiguous extent of physical memory that contains no holes. 
Cluster descriptors are ordered by increasing physical address; the range of PFNs described by 
cluster N is of lower address than the range of PFNs described by cluster N+1. 


The MEMDSC table must be quadword aligned and both physically and virtually contiguous. 
The MEMDSC table format is shown in Figure 3-2; the memory cluster descriptor format is 
shown in Figure 3—3. The size of the MEMDSC table can be determined by the number of 
clusters contained in MEMDSC[16]. The size of the table and the offset to the last quadword 
of the table are given by: | . 


MEMDSC_SIZE = ((7 * MEMDSC[10,¢]) + 3) * 8 
MEMDSC_ END = MEMDSC SIZE -8 


The memory within a cluster is either available to system software or reserved for console use. 
Usage within a cluster cannot be mixed; if the cluster contains a page reserved for console use, 
system software cannot allocate any page within the cluster. The memory cluster descriptor 
contains a cluster usage field that indicates the cluster availability to system software. The pri- 
mary bootstrap image must reside in clusters available to system software. 


The memory within each cluster may be fully tested, partially tested, or untested by the con- 
sole. If the memory is untested, no cluster memory bitmap is built. The console must test 
enough memory to allow the primary bootstrap image to be loaded and control to be passed to 
that image. This memory includes: 


¢ PALcode memory and scratch areas 
© CPU logout areas 
¢ Memory bitmaps 
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¢ HWRPB and all offset blocks 

¢ Console CRB map entries 

¢ Bootstrap address space page tables 

e Primary bootstrap image 

¢ One page for the initial bootstrap stack 


Any additional memory testing by the console is implementation specific. It is the responsibil- 
ity of system software to test any memory not tested by the console. 


A cluster bitmap is built if the cluster is available to system software and the console tests any 
memory within the cluster. Each page in the cluster is represented by a bit in the bitmask. A 
‘1’ in the bitmap means that the corresponding page is "good"; the page was tested without 
error. A ‘0’ in the bitmap means that the corresponding page is "bad"; the page is either 
untested or was tested but encountered correctable (Corrected Read Data) errors or hard (Read 
Data Substitute) errors. 


Cluster bitmaps must be at least quadword aligned and must be an integral number of quad- 
words; any unused bits in the highest addressed quadword must be zero. 


Implementation Notes: 


Every implementation cannot be required to test all of memory before booting the 
operating system. Partial memory testing is recommended whenever testing is 
time-consuming and would significantly delay the bootstrapping process; the choice is 
implementation specific. The high-water mark mechanism allows implementations to 
completely size memory without testing all of it and indicate to the operating system 
where testing ended. 


Clusters reserved for the use of the console and PALcode do not have associated bitmaps. 
If such a cluster would contain a large number (three or more) of contiguous pages that 
encounter soft read errors or are otherwise unsuitable for console and PALcode, the 
console should consider breaking the bad pages into a separate cluster. This cluster should 
be made available for use by system software, which can possibly reclaim the pages for 
use. 


The console does not alter the Memory Data Descriptor table or any bitmaps across warm 
bootstraps. This permits system software to propagate information on system software 
memory testing and intermittent errors across operating system bootstraps. For example, 
system software could set the "bad" bit of a page that incurred repeated CRD errors. 
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Figure 3-2: Memory Cluster Descriptor Table 
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Offset Description 


MEMDSC CHECKSUM — Checksum of all the quadwords from offset MEMDSC+8 
through MEMDSC_END. Computed as a 64-bit sum, ignoring overflows. The 
checksum does not include any of the cluster bitmaps or any optional imple- 
mentation-specific data. 


+08 IMP_DATA_PA — Physical address of additional implementation-specific 
information (if any). If no additional implementation-specific information 
exists, the field must contain a zero. 


+16 CLUSTERS — Number of clusters in the Memory Cluster Descriptor table. 
Unsigned integer. 


+24 CLUSTER — Each Memory Cluster Descriptor describes an extent of physi- 
cal memory. See Figure 3-3. 


Figure 3-3: Memory Cluster Descriptor 
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Table 3-5: Memory Cluster Descriptor Fields 


Offset Description 
MEMC _ PEN — Starting PFN of the memory cluster. 
+08 PAGES — Number of pages in the memory cluster. Unsigned integer. 


+16 TESTED_PAGES — Number of tested memory pages in the cluster. If only a 
limited extent of the cluster memory was tested, a bitmap is built, and this field 
indicates the number of pages that were tested. 


+24 BITMAP_VA — Starting virtual address of the cluster memory testing bitmap 
in the bootstrap address space. If the memory is untested, no bitmap is built and 
this field is set to zero. 


+32 BITMAP_PA — Starting physical address of the cluster memory testing bitmap. 
If the memory is untested, no bitmap is built and this field is set to zero. 
+40 BITMAP_CHECKSUM — Checksum of the cluster memory testing bitmap. 


Computed as a 64-bit sum, ignoring overflows, over the PAGES active bits only. 
+48 USAGE — Indicates whether the cluster is available for use by system software. 
e If USAGE<O0> is ‘0’, system software may allocate and use the cluster. 


e If USAGE<0> is ‘0’ and USAGE<1> is ‘1’, the cluster is available for 
use by the system software, but is in nonvolatile memory. 


e §6If USAGE<0> is ‘1’, the cluster is reserved for console use and must not 
be allocated by system software. 


© USAGE<63:2> should be zero. 


3.4.1.3 Bootstrap Address Space 


All system software, including the primary bootstrap image, runs in a virtual memory environ- 
ment. The console creates the initial page tables that define the initial bootstrap address space 
for the primary bootstrap. System software may replace this bootstrap address space at any 
time after the console passes control to the primary bootstrap image. 


The bootstrap address space consists of four regions. All regions must be located in good mem- 
ory within clusters that are available to system software. The regions are: 


Region 0 


This region maps all console or PALcode data structures that must be shared with system soft- 
ware. These structures include the HWRPB in its entirety, all blocks located by HWRPB 
offsets, the console callback routines, and all memory bitmaps. Region 0 begins at address 
256MB, virtual address 0000 0000 1000 0000,¢. The starting address of the HWRPB is the 


base of Region 0. 
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Region 1 

The primary bootstrap image is loaded into this region. The region must be at least large 

enough to load system software plus three pages. The three additional pages are used as an ini- 

tial bootstrap stack and stack guard pages. The stack guard pages are virtually adjacent to the 

bootstrap stack page and marked no-access. All other pages in the region are mapped and 

valid. Region | begins at address 512MB, virtual address 0000 0000 2000 0000, ¢. 

Software Note: 
This region must be set to the size of the primary bootstrap image plus 3 pages for 
OpenVMS Alpha and at least 256K bytes for DIGITAL UNIX. 


Region 2 


This region, or "page table space," contains the bootstrap address space page tables. Region 2 
begins at address 1GB, virtual address 0000 0000 4000 0000),¢. The range depends on the 
page size: 


Page Size —_— Page Table Snace Address Range 
8KB 1GB to 1GB+8MB 

16KB 1GB to 1GB+16MB 

32KB 1GB to 1GB+32MB 

64KB 1GB to 1GB+64MB 


This region includes the Level 2 and Level 3 page tables used to map all three regions compris- 
ing bootstrap address space. The Level 2 page table maps itself as a Level 3 page table. The 
address of the Level 2 page table page and the PTE within the page that is used for self-map- 
ping also depend on the page size. 


Page Size Virtual Address of 
Level 2 Page Table 

8KB 1GB+1MB 

16KB  —«1GB+512KB 

32KB 1GB+256KB 

64KB 1GB+128KB 


Implemention Note: 


L2PTE Number 
Used for Self-Mapping 


128 
a2 
8 

2 


Region 2 allows the primary bootstrap code to start with 32-bit pointers that execute in a 
32-bit context. Thus, Region 2 allows primary bootstrap software to be written with 


32-bit-oriented language compilers. 
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Region 3 

This region maps the entire page table structure, including all levels of page table, that would 
be required to map the entire virtual address space supported by this implementation. The 
most significant page table is self-mapped by the second PTE in the page. If booted with a 
three-level page table configuration, this is the Level 1 page table. If booted with a four-level 
page table configuration, this is the Level 0 page table. 


Region 3 exists to support virtual page table lookup for Translation Buffer misses. Region 3 
exists at a virtual address that is inaccessible to code that is compiled to support only a 32-bit 
virtual address space. As such, Region 3 is not the primary page table space that is presented 
to bootstrap software. 


Programming Note: 


Due to the self-mapping, Region 3 maps all page table pages. The Level 2 and Level 3 
page table pages are in both Region 2 and Region 3. 


Virtual Address of Page Table Space (VPTB) 


Page Size (Three-Level PT) (Four-Level PT) 
8KB 8GB_ 8TB 

16KB 64GB 128TB 

32KB 512GB 2048TB 

64KB 4TB 32768TB 
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Figure 3-4: Initial Virtual Memory Regions 
Region 0 Region 3 
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All valid pages allow read/write access from kernel mode and deny all access from other 
modes. All fault bits (FOR, FOW, FOE) are clear, as well as Address Space Match (ASM) and 
Granularity Hint (GH). 


The self-mapping of the Levei 2 page tabie exciudes any higher-ievei page tabie from Region 
2. If operating in the mode of three levels of page table, this is just the Level 1 page table. In 
this case, the Level 1 page table has two active PTEs. The first LIPTE points to the PFN of 
the Level 2 page table page, which maps page table space (Region 2). The second L1PTE con- 
tains the PFN of the Level 1 page table itself, thus defining Region 3. Only these two entries 
within the Level 1 page table are valid; all other Level 1 PTEs are zero. 


If operating in the mode of four levels of page table, both a Level 0 and Level 1 page table 
exist and both are excluded from Region 2. In this case, the Level 0 page table has two active 
PTEs. The first LOPTE points to the PFN of the Level 1 page table page. The second LOPTE 
contains the PFN of the Level 0 page table itself, thus defining Region 3. The Level 1 page 
table has one active PTE. The first LIPTE points to the PFN of the Level 2 page table page, 
which maps page table space (Region 2). Only the first two entries within the Level 0 page 
table, and the first entry within the Level 1 page table, are valid; all other Level 0 and Level 1 
PTEs are zero. 
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Figure 3-5: Three-Level Initial Page Tables 


Level 1 PT 


PTBR: 









Last PTE 
Level 2 PT 
Level 3 PT 
First Maps VA=256 MB 
Region 0 


Page Table 
Maps VA=512 MB 


Level 3 PT 
Region 1 Maps VA=1 GB 
T Page Table T 
¢ ee eee eer 


The level 2 PT maps Region 2 (page table 
space) at 1 GB. The level 2 PT maps itself 
as its own Jevel 3 PT. 











The level 1 PT is not mapped. 


The self-mapping of the Level 2 page table also causes the addresses of the Level 2 and Level 
3 PTEs for a given virtual address to be functions of that address. For every virtual address 
within the bootstrap address space, there is exactly one location within page table space for the 
Level 2 PTE that maps that virtual address, and exactly one location for the Level 3 PTE that 
maps that virtual address. 


Thus, the Level 2 and Level 3 PTE virtual addresses for a given virtual address (VA) within 
bootstrap address space can be calculated given the page size. The following bit range defini- 
tions provide convenient notation for referring to the constituent parts of a virtual address. For 
example, VA<L2> is equivalent to VA<32:23> for an 8K byte page size. 


Page Size LO! Li 2 L3 

8KB 52:43 42:33 32:23 22:13 
16KB 57:47 46:36 35:25 24:14 
32KB 62:51 50:39 38:27 26:15 
64KB 63:55 54:42 41:29 28:16 


' LO pertains only to the mode where four levels of page table have been configured. 
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The base of page table space is a constant value: 
1. PT Base = 1GB 


The virtual address of the Level 3 PTE (L3PTE_VA) of any virtual address (VA) is 
given by: . 


2. L3PTE _VA(VA) = PT Base + (page _size*VA<L2>) + (8*VA<L3>) 
Thus, the virtual address of the Level 3 PTE that maps the lowest address of page 
table space is given by: 
L3PTE_VA(PT Base) = PT Base + (page size * PT Base<L2>) 
Since the Level 2 page table is self-mapped, the above is also the base virtual address 
of the Level 2 page table. Thus: - 


3. L2PT_Base = PT Base + (page _size * PT Base<L2>) 
Finally, the virtual address of the Level 2 PTE (L2PTE_VA) of any virtual address 


(VA) is given by: 
L2PTE VA(VA) = L2PT Base + (8 * VA<L2>) 
4, L2PTE VA(VA) = PT Base + (page size * PT Base<L2>) + (8 * VA<L2>) 
3.4.1.4 Bootstrap Flags | 


The Bootstrap-in-Progress (BIP) and Restart-Capable (RC) processor state flags in the primary 
processor’s per-CPU slot are used to detect failed bootstraps. If the primary re-enters console 
I/O mode while the BIP flag is set and the RC flag is clear, the bootstrap attempt fails, and the 
subsequent console action is determined by Figure 3-1. 


The console sets the BIP flag and clears the RC flag before transferring control to system soft- . 
ware. System software sets the RC flag to indicate that sufficient context has been established 
to handle a restart attempt. System software clears the BIP flag to indicate that the bootstrap 
operation has been completed. The RC flag should be set before clearing the BIP flag. Table 3— 
6 gives the console interpretation of BIP and RC flags. 


Table 3—6: Console Interpretation of BIP and RC flags 


BIP RC Interpretation at Entry to Console I/O Mode 

set clear Failed bootstrap 

set set Halt condition encountered during bootstrap, restart processor 
clear clear Failed restart 


clear set Halt condition encountered, restart processor 


3.4.1.5 Loading of System Software 


The console is responsible for loading system software at the base of Region 1 beginning at 
virtual address 512MB. This software is expected to be a primary bootstrap program that is 
responsible for loading other system software, but may be diagnostic or other special-purpose 
software. Section 3.6 contains descriptions of the format of each supported bootstrap medium. 
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The console uses the BOOT_DEV environment variable to determine the bootstrap device and 
the path to that device. Environment variables contain lists of bootstrap devices and paths; 
each list element specifies the complete path to a given bootstrap device. If multiple elements 
are specified, the console attempts to load a bootstrap image from each in turn. 


The console uses the BOOTDEF_DEV, BOOT_DEV, and BOOTED_DEV environment vari- 
ables as follows: 


e §©At console initialization, the console sets the BOOTDEF_DEV and BOOT_DEV envi- 
ronment variables to be equivalent. The format of these environment variables depends 
on the console implementation and is independent of the console presentation layer; the 
value may be interpreted and modified by system software. 


¢ When a bootstrap results from a BOOT command that specifies a bootstrap device list, 
the console uses the list specified with the command. The console modifies 
BOOT_DEYV to contain the specified device list. 


Note: 
This may require conversion from the presentation layer format to the registered 
format. 


¢ When a bootstrap is the result of a BOOT command that does not specify a bootstrap 
device list, the console uses the bootstrap device list contained in the BOOTDEF_DEV 
environment variable. The console copies the value of BOOTDEF_DEV to 
BOOT_DEV. 


¢ When a bootstrap is not the result of a BOOT command, the console uses the bootstrap 
device list contained in the BOOT_DEV environment variable. The console does not 
modify the contents of BOOT_DEV. 


e The console attempts to load a bootstrap image from each element of the bootstrap 
device list. If the list is exhausted before successfully transferring control to system 
software, the bootstrap attempt fails and the subsequent console action is determined by 
Figure 3-1. 


e The console indicates the actual bootstrap path and device used in the BOOTED_DEV 
environment variable. The console sets BOOTED_DEYV after loading the primary boot- 
strap image and before transferring control to system software. The BOOTED_DEV 
format follows that of a BOOT_DEYV list element. 


¢ If the bootstrap device list is empty, BOOTDEF_DEV or BOOT_DEV are NULL 
(00;6), and the action is implementation specific. The console may remain in console 
I/O mode or attempt to locate a bootstrap device in an implementation-specific manner. 


The BOOT_FILE and BOOT_OSFLAGS environment variables are used as default values for 
the bootstrap file name and option flags. The console indicates the actual bootstrap image file 
name (if any) and option flags for the current bootstrap attempt in BOOTED_FILE and 
BOOTED_OSFLAGS and environment variables. The BOOT_FILE default bootstrap image 
file name is used whenever the bootstrap requires a file name and either none was specified on 
the BOOT command or the bootstrap was initiated by the console as the result of a major state 
transition. The console never interprets the bootstrap option flags, but simply passes them 
between the console presentation layer and system software. 
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3.4.1.6 Processor Initialization 


Before control is transferred to system software, certain IPRs and other processor state must 
be initialized as shown in Table 3-7 and Section 3.3.2.3 for each PALcode variant. Processor 
initialization is performed by the console before booting a processor, before restarting a pro- 
cessor, or as the result of the INITIALIZE —CPU console command. 


The Context Valid (CV) flag in the processor’s per-CPU slot must be valid for processor ini- 
tialization to be successful. If the CV flag is clear, the HWPCB contained in the per-CPU slot 
is not valid, and the console must not transfer control to system software. If this or any error 
occurs in initializing the processor, the console retains control of the system and generates the 
binary error message ERROR_PROC_INIT. 


Table 3-7: Processor Initialization 


Processor State 

ASN Address Space Number 
ASTEN! AST Enable 

ASTSR! AST Summary 

FEN Floating Enable 

IPL Interrupt Priority Level 
MCES Machine Check Error Summary 
PCBB Privileged Context Block 
PS Processor Status 

PTBR Page Table Base Register 
SISR! Software Interrupt Summary 
WHAMI Who-Am-I 

scc! System Cycle Counter 

SP Kernel Stack Pointer 

Other IPRs 


Cache, instruction buffer, or write buffer 


Translation buffer 


Main memory 


Integer and floating-point registers 


Reason for Halt code 


BIP and RC flags 


Environment variables 


: OpenVMS Alpha only. 


Initialized State 

7ero 

ASTEN in processor’s HWPCB 
ASTSR in processor’s HWPCB 


FEN in processor’s HWPCB 

Highest 

8 (bit 3=1) 

Address of processor’s HWPCB 
IPL=highest, VMM=0, CM=K, SW=0 
PEN value in processor’s HWPCB 


Zero 


CPU identifier 


Zero 


KSP in processor’s HWPCB 
UNPREDICTABLE 

Empty or valid 

Invalidated 

Unaffected 

Unaffected, except SP 
Unaffected 

Unaffected 

Unaffected 
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3.4.1.7 Transfer of Control to System Software 


Before transferring control to system software, the console must define valid hardware privi- 
leged context for that software. The console builds that context in the hardware privileged 
context block (HWPCB ) in the primary processor’s per-CPU slot. The initialized context is 
summarized in Table 3-8 and Section 3.3.2.3 for each PALcode variant. 


The initial KSP points to the lowest addressed quadword in the higher addressed stack guard 
page (top-of-stack) of Region 1 of the bootstrap address space. The PTBR points to the Level 
1 page table page. All other scalar and floating-point register contents are UNPREDICTABLE. 


After building the HWPCB for the primary processor, the console sets the Context Valid (CV) 
flag in the primary’s per-CPU slot. All other bootstrap information is passed from the console 
to system software by environment variables. See Section 2.2 for more details. 


Table 3-8: Initial HWPCB contents 


HWPCB Field Initialized State 


KSP Top-of-stack (contents of SP) 
ESP! UNPREDICTABLE 
ssp! UNPREDICTABLE 
USP UNPREDICTABLE 
PTBR PEN of Level 1 page table 
ASN Zero 
ASTSR! Zero 
ASTEN! Zero (all disabled) 
FEN Zero (disabled) 
PCC Zero 
Unique value Zero 
~ PALcode scratch Implementation specific 


oe OpenVMS Alpha systems only. 


Control is transferred to system software in kernel mode at the highest IPL with virtual mem- 
ory management enabled. Control is transferred to the first longword of the system software 
image loaded into Region 1, virtual address 0000 0000 2000 0000,¢. Before transferring con- 
trol, the console ensures that the SP contains the KSP value in the HWPCB. System software 
should assume that the stack is initially empty. 


The transfer of control transitions the primary processor from the halted state into the running 
state and from console I/O mode into program I/O mode. The rest of the uniprocessor boot- 
strap process is the responsibility of system software. 
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3.4.2 Warm Bootstrapping in a Uniprocessor Environment 


3.4.2.1 


The actions of the console on a warm bootstrap are a subset of those for a cold bootstrap. A 
system bootstrap will be a warm bootstrap whenever the BOOT_RESET environment variable 
is set to "OFF", and console internal state permits. . 


The console performs the following steps in the warm bootstrap sequence: 
1. Locate and validate the Hardware Restart Parameter Block (HWRPB) 
2. Locate and load the system software primary bootstrap image 
3. Initialize processor state on all processors 
4. Initialize bootstrap page tables and map initial regions 
5. Transfer control to the system software primary bootstrap image 


At warm bootstrap, the console does not load PALcode, does not modify the Memory Data 
Descriptor table, and does not reinitialize any environment variables. If the console cannot 
locate and validate the previously initialized HWRPR, the console must initiate a cold boot- 


strap. Before beginning a bootstrap, the console must clear any internally pended restarts to 
any processor. 


Programming Note: 


Warm bootstrap permits system software to preserve limited context across bootstraps. 


HWRPB Location and Validation 


After console initialization, the console must preserve the location of the HWRPB in an imple- 
mentation-specific manner. On warm bootstraps and restarts, the console locates the HWRPB 
and verifies it by ensuring that: 


1. The first quadword of the table contains the physical address of the table. 
2. The second quadword of the table contains "HWRPB" (0000 0042 5052 5748}¢). 


3. The quadword at offset HWRPB[288] contains the 64-bit sum, ignoring overflows of 
the quadwords from offset HWRPB[00] to HWRPB[280], inclusive, relative to the 
beginning of the potential HWRPB. 


4. The quadword at offset [0] of the MEMDSC block contains the 64-bit sum, ignoring 
overflows, of the quadwords from MEMDSC+8 through MEMDSC_END of that 
block. The MEMDSC block is located by the MEMDSC offset at HWRPB[200]. See 
Figure 3-2. 


5. As described in Section 2.1.4, if a CONFIG table exists, it is located by the CONFIG 
offset at HWRPB[208]. The quadword at offset [8] of the optional CONFIG table con- 
tains the 64-bit sum, ignoring overflows, of the quadwords from CONFIG+16 through 
CONFIG_END of that table. 


bootstrap is indicated, a cold bootstrap will be performed. 


The console must not search memory for a HWRPB; searching memory constitutes a security 
hole. 
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3.4.3 Multiprocessor Bootstrapping 


Multiprocessor bootstrapping differs from uniprocessor bootstrapping primarily in areas relat- 
ing to synchronization between processors. In a shared memory system, processors cannot 
independently load and start system software; bootstrapping is controlled by the primary 
processor. 


3.4.3.1 Selection of Primary Processor 


The primary processor is selected by the console during system initialization before any access 
to main memory by any processor. Selection of the primary processor may be done in any 
fashion that guarantees choosing exactly one primary processor. 


Once a primary processor has been selected, the secondary processors take no further action 
until appropriately notified by the primary processor. In particular, secondary processors must 
not access main memory. 


3.4.3.2 Actions of Console 


After selection, the console proceeds to bootstrap the primary processor, after the normal uni- 
processor bootstrap as described in Section 3.4.1. 


The console must correctly initialize all HWRPB fields used for synchronization or communi- 
cation between the processors. The console must initialize the PRIMARY CPU ID field at 
HWRPB[32], zero the TXRDY and RXRDY bitmasks at HWRPB[296] and HWRPB[304], 
and recompute the HWRPB checksum at HWRPB[288]. 


The console must also initialize each per-CPU slot for the secondary processors. The console 


must: 


Clear the BIP, RC, OH, and CV flags 

Clear the Halt Request code field 

Set the PP flag if the processor is present 

Set the PA flag if the processor is present and available for use by system software 
Set the PMV and PL flags if the console has loaded PALcode on this processor | 
Set the PV flag if the console has initialized PALcode on this processor 


Set the PE processor variation flag if the processor is eligible to become a primary 


After initializing each processor’s per-CPU slot, the console must notify each console second- 
ary processor of the existence and location of the valid HWRPB. 


3.4.3.3 PALcode Loading on Secondary Processors 


Most console implementations load PALcode on all secondary processors before bootstrap- 
ping the primary processor. Console implementations may delay the loading or initialization 
of PALcode on a secondary processor. If delayed, PALcode loading and initialization require 
the cooperation of system software executing on the running primary and the console execut- 
ing on behalf of the secondary. 


The console secondary must have performed any necessary initialization as described in Sec- 
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tion 3.4.3.5. All interprocessor console communications follow the mechanisms described in 
Section 2.4. . 


The following procedure applies only to initial PALcode loading on a console secondary. The 
PALcode variant to be loaded must be identical to that of the running primary processor 
before any PALcode switching by system software. This procedure cannot be used to load 
operating system-specific PALcode variants: 


1. 


10. 


11. 


12. 


13. 


14. 


The console secondary initializes the PALcode memory and scratch space length fields 
in its per-CPU slot. 


The console secondary sets the PALcode major revision, minor revision, and compati- 
bility subfields in the PALcode revision field in its per-CPU slot. 


The console secondary notifies the primary that PALcode loading is requested by trans- 
mitting a message to the running primary as described in Section 2.4. 


The console secondary polls the PALcode Memory Valid (PMV) flag in its per-CPU 
slot. 


The (uiiuiuy prumary detects the consoie secondary request. 


The running primary verifies that the Processor Available (PA) flag is set in the second- 
ary’s per-CPU slot. If the flag is not set, the operation fails. 


The running primary compares the major and minor revision subfields of the PALcode 
revision field in its per-CPU slot to that in the secondary’s per-CPU slot. If the revision 
levels do not match, the running primary proceeds to step 12. 


The running primary compares the number of processors currently sharing its PALcode 
image to the maximum contained in the subfield of the PALcode revision field of its 
per-CPU slot. If the current number is the maximum, no additional console secondary 
can share the PALcode image. The running primary proceeds to step 12. 


Programming Note: 
The running primary can determine the number of processors currently sharing a 
given PALcode image by counting the number of per-CPU slots with the same 
valid PALcode memory space descriptors. A PALcode memory space descriptor 
is valid if the PALcode Loaded (PL) flag is set in the per-CPU slot. 


The running primary copies the PALcode memory and scratch space descriptors from 
its per-CPU slot into the secondary’s per-CPU slot. 


The running primary copies the PALcode variation, compatibility, and maximum num- 
ber of processors subfields of the PALcode revision field from its per-CPU slot into the 
secondary’s per-CPU slot. 


The running primary sets the PALcode Loaded (PL) flag in the secondary’s per-CPU 
slot, then proceeds to step 13. 


The running primary allocates physical memory for PALcode memory and scratch 
areas and records the addresses in the secondary’s per-CPU slot. 


The running primary sets the PALcode Memory Valid (PMV) flag in the secondary’s 


per-CPU slot. 


The console secondary observes that the PMV flag is set in its per-CPU slot. 


DIGITAL Restricted Distribution 


3-24 Console Interface Architecture (III) 


15. If the PL flag in its per-CPU slot is not set, the console secondary loads PALcode into 
the allocated PALcode memory and scratch space. In this case, the console secondary 
sets the PALcode Loaded (PL) flag in its per-CPU slot. 


16. The console secondary ensures that any required implementation-specific PALcode ini- 
tialization is performed. 


17. The console secondary sets the PALcode Valid (PV) flag in the secondary’s per-CPU 
slot. 


The PALcode memory and scratch space must be page aligned. If not allocated by the console 
before system bootstrap, the allocation management of PALcode memory for secondary pro- 
cessors is the responsibility of system software. 


It is the responsibility of console and system software to ensure that the initially loaded PAL- 
code variation and revision levels of all processors are compatible. This may be performed by 
the primary before starting the secondary, by the starting secondary, or any combination 
thereof. PALcode images of the same PALcode variation but different revision levels are com- 
patible if the PALcode revision compatibility subfields match. 


3.4.3.4 Actions of the Running Primary 


3.4.3.5 


System software executing on the primary processor must initialize the HWPCB for each sec- 
ondary processor. The HWPCB contains the necessary privileged context for the execution of 
system software and successful restarts. The HWPCB must be initialized before requesting 
that the console secondary perform any START command. After initializing the HWPCB, sys- 
tem software sets the Context Valid (CV) flag. 


Once the PALcode is valid on a console secondary, the secondary waits for a START (or 
other) command from the running primary. System software issues the necessary console com- 
mands that instruct the secondary to begin executing software. The exchange of commands 
and messages between the running primary and a secondary is described in Section 2.4. 


System software may start secondary processors at any time. In particular, secondary proces- 
sors may be started before or after switching PALcode on the running primary. If system 
software switches to an operating system-specific PALcode before starting a secondary proces- 
sor, system software must update the PALcode revision field in the per-CPU slot (SLOT[168]) 
of each secondary before starting the secondary. See Section 3.3.1. 


Programming Note: 


All commands sent to a console secondary are implicitly targeted to the secondary. 


Actions of a Console Secondary 


After failing to become the primary, a console secondary uses an implementation-specific 
mechanism to determine when a valid HWRPB has been constructed in main memory. The 
console secondary then locates the HWRPB in an implementation-specific manner. 


Once the HWRPB is located, the secondary locates its per-CPU slot using its CPU ID as an 
index. The secondary verifies that its slot exists by comparing its CPU ID to the number of 
per-CPU slots at HWRPB[144]. If its CPU ID exceeds the number of per-CPU slots, the sec- 
ondary must not leave console mode or continue to access main memory. If PALcode loading 
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is necessary, the console secondary follows the procedure given in Section 3.4.3.3. 


Once PALcode is valid, the console secondary waits for a START (or other) command from 
the running primary by polling the appropriate flag in the RXRDY bitmask. The exchange of 
commands and messages between the running primary and a secondary is described in Section 
2.4. 


In response to a START command, the console secondary: 

1. Verifies that the Context Valid (CV) flag is set in its per-CPU slot. 
Sets the Bootstrap-in-Progress (BIP) flag in its per-CPU slot. 
Clears the Restart-Capable (RC) flag in its per-CPU slot. 


Initializes the processor. 


yh dee Te 


If necessary, switches to the system software specific PALcode variant identified in the 
PALcode revision field in the per-CPU slot. 


Loads the privileged context specified hy the HWPCR in its ner-CPU slot. 
Loads the procedure value at HWRPB[264] into R27. 
Clears R26 and R25. 


Loads the virtual page table base (VPTB) register with the value stored in 
HWRPB[120]. 


10. Transfers control to the CPU Restart routine, whose virtual address is stored in 
HWRPB[256]. 


The CV flag indicates that the HWPCB in the slot contains valid hardware privileged state for 
system software. If the CV flag is not set, the processor remains in console I/O mode. 


SO BOs i 


The console uses the PALcode revision field in the per-CPU slot to determine if system soft- 
ware has switched PALcode to a system software-specific variant. The console must restore 
that variant before passing control to the CPU restart routine. 


3.4.3.6 Bootstrap Flags 


The Bootstrap-in-Progress (BIP) and Restart-Capable (RC) processor state flags in the console 
secondary processor’s per-CPU slot are used to control error recovery during secondary starts. 
If the secondary re-enters console I/O mode while the BIP flag is set and the RC flag is clear, 
the start attempt fails. Failed starts are equivalent to failed bootstraps, and the subsequent con- 
sole action is determined by Figure 3—1. See Section 3.4.1.3 and Table 3-6. 


3.4.4 Addition of a Processor to a Running System 


A processor may be added to a running system at any time if a slot has been provided for it in 
the HWRPB. The new console secondary processor follows the secondary start procedure 
given in Sections 3.4.3.3 and 3.4.3.5, with one minor difference. If no PALcode loading is nec- 
essary, the console secondary sends a PS TARTREQ? message to the running primary. This - 
message notifies the primary that a new processor has been added to the configuration. After 
sending the 7STARTREQ? message, the console secondary waits for a START (or other) com- 
mand from the running primary. See Section 2.4 for a description of interprocessor console 
communication. 
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3.4.5 System Software Requested Bootstraps 


System software can request that the console perform a system bootstrap. This request can be 
made on any processor in a multiprocessor system and overrides the setting of the 
AUTO_ACTION and BOOT_RESET environment variables. 


To request a bootstrap, system software sets one of the codes requested by the bootstrap in the 
Halt Request field of its per-CPU slot, then executes a CALL_PAL HALT instruction. If a 
cold bootstrap is requested, the "Cold Bootstrap Requested" code (‘2’) is set; the "Warm Boot- 
strap Requested" (‘3’) code is set to request a warm bootstrap. 


Instead of initiating the normal error halt processing described in Section 3.5.4, the console ini- 

- tiates the appropriate system bootstrap as described in Sections 3.4.1 and 3.4.2. The bootstrap 
attempt is unconditional; the AUTO_ACTION or the BOOT_RESET environment variables 
do not affect the bootstrap attempt. 


3.5 System Restarts 


The console is responsible for restarting a processor halted by powerfail or by error halt. The 
console follows the same sequence for a primary or secondary processor. 


3.5.1 Actions of Console 


The console begins the restart sequence by locating and then validating the HWRPB, using the 
procedure given in Section 3.4.2.1. If the HWRPB is not valid, the restart attempt fails. See 
Section 3.1.1 for console actions at major state transitions. 


If the HWRPB is valid, the console uses the processor CPU ID as an index to calculate the 
address of that processor’s HWRPB slot. The console: 


1. Verifies that the processor’s PALcode Valid (PV) flag is set. If the PV flag is clear, 
PALcode is not valid, and the restart attempt fails. 


2. Verifies that the processor’s Context Valid (CV) flag is set. If the CV flag is clear, the 
HWPCEB does not contain valid software context for the restart, and the restart attempt 
fails. 


3. If the Reason for Halt is anything other that "powerfail restart", the console examines 
the processor’s Restart-Capable (RC) flag. If RC is set, the console proceeds with the 
restart at step 5. If RC is clear, system software is not capable of attempting the restart, 
and the restart attempt fails. 


Ignoring the RC flag for powerfail restart avoids unnecessary bootstraps that are 
caused by repeated power failures that in turn, are caused by a bouncing power supply 
that prevents software from having sufficient time to set the RC flag. 


4. Examines the Bootstrap-in-Progress (BIP) flag. If BIP is clear, and _ the 
AUTO_ACTION environment variable is "BOOT", a system bootstrap is attempted. 
Otherwise, the processor remains in console I/O mode. See Figure 3-1. 


5. Examines the PALcode revision field in its per-CPU slot. If the revision field does not 
match the PALcode revision in use by the console, the console must switch PALcode 
before passing control to the CPU Restart routine. 
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6. Loads the privileged context specified by the HWPCEB in its per-CPU slot. 

7. Loads the procedure value at HWRPB[264] into R27. 

8. Clears R26 (return address) and R25 (argument information). 

9. Loads the virtual page table base (VPTB) register with the value stored in 
HWRPB[120]. ; 

10. Transfers control to the CPU Restart routine, whose virtual address is stored in 
HWRPB[256]. 


On all restart attempt failures the console initiates the action indicated by Figure 3—1. The PV 
and CV flags should never be clear for the primary processor; if either flag is clear, then the 
restart fails. Also, no PALcode or system software is loaded during a restart. 


It is the responsibility of system software to complete the restart operation and to set the RC 
flag at the point where a subsequent restart can be handled correctly. 


3.5.2 Powerfail and Recovery — Uniprocessor 


On Alpha systems, the system power supply conditions external power and transforms it for 
use by the processor, memory, and I/O subsystems. Backup options are available on some sys- 
tems to supply power after external power fails. The backup option may supply power to all of 
the system platform hardware or only a subset. The effect of an external power failure 
depends on the backup option: 


¢ Ifno backup option exists, the processor cannot be restarted after power is restored. The 
processor must be bootstrapped or left halted in console I/O mode. 


e If the backup option maintains power to all of the system platform hardware, execution 
of system software is unaffected by the power failure. It must be possible for system 
software to determine that a transition to backup power has occurred. 


e If the backup option maintains only the contents of memory and keeps system time with 
the BB_WATCH, the power supply must request a powerfail interrupt. After requesting 
the interrupt, the power supply must continue to supply power to the processor for an 
implementation-specific period to allow system software to save state. 


-Powerfail recovery is possible only if adequate system state is preserved during an 
interruption of power to the processor. System software must save all volatile state 
and perform any operating system-specific actions necessary to ensure later successful 
recovery. 


When power is restored, the console determines that the HWRPB is still valid, then examines 
the console lock and AUTO_ACTION environment variable. If the console is locked, and 
AUTO_ACTION environment variable is "RESTART", the console attempts an operating sys- 
tem restart. See Section 3.1.1. 


The processor may lose state when power is lost. For example, if a processor is halted when 
power fails, the action on power-up is still determined by the console switches and environ- 


ment variables. The system does not necessarily stay halted. 
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Software Note: 


As explained in OpenVMS Alpha Software (II-A), Chapter 6, and DIGITAL UNIX 
Software (II-B), Chapter 5, a powerfail interrupt is delivered at an appropriate IPL to the 
interrupt service routine located at SCB offset 640)¢ for that operating system. 


3.5.3 Powerfail and Recovery — Multiprocessor 


There are two basic approaches to powerfail recovery on multiprocessor systems: 
¢ United 
All available processors effectively experience the powerfail event identically. 
e = Split 


Each available processor effectively experiences independent powerfail events. 


A processor is "available" if the Processor Available (PA) flag is set in the processor’s 
per-CPU slot. The powerfail system variation flag at HWRPB[88] indicates the type of power- 
fail and restart action. 


A multiprocessor Alpha system that supports powerfail recovery must implement the united 
powerfail mode. The split mode may be implemented optionally as an alternative, selected at 
system bootstrap. 


Software Note: 


OpenVMS Alpha supports only the united powerfail and recovery mode at this time. 
Powerfail recovery is possible only when the primary is restarted; all secondaries should 
remain in console I/O mode. 


3.5.3.1 United Powerfail and Recovery 


In united powerfail and recovery mode, all available processors experience powerfail inter- 
rupts, halts, and restorations uniformly. If one available processor experiences a powerfail 
event, all other available processors experience that event. Therefore, if one processor power- 
fails and recovers, all processors must do so. Even if a separately powered processor does not 
actually lose power, that processor will still receive the powerfail interrupt and must be 
restarted as if power had been lost. 


When power is restored and a restart is to be attempted, the console must determine whether to 
restart all available processors or only the primary processor. The console determines the 
appropriate action by the Powerfail Restart (PR) flag in the system variation field of the 
HWRPB[88]. If the PR flag is set, the console attempts to restart all available processors; if 
PR is clear, the console attempts to restart only the primary processor. In both cases, it is the 
responsibility of system software to coordinate and synchronize further powerfail recovery. 


3.5.3.2 Split Powerfail and Recovery 


In split powerfail and recovery mode, only the available processors that actually experience a 
loss of power will experience a powerfail interrupt and subsequent recovery. Available proces- 
sors that are separately powered and do not lose power do not experience a powerfail interrupt. 
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When power is restored and a restart is to be attempted, the console must determine whether to 
restart any available processor or only the primary processor. As in the united mode, the con- 
sole determines the appropriate action by the Powerfail Restart (PR) flag in the system 
variation field of the HWRPB[88]. If the PR flag is set, the console attempts to restart any 
available processor. If PR is clear, the console attempts to restart only the primary processor; 
on a secondary, the console sends the 7S TARTREQ? message and waits for a START (or 
other command) from the running primary as discussed in Section 3.4.3.5. Again, system soft- 
ware has the responsibility for further coordination and synchronization of powerfail recovery. 


3.5.4 Error Halt and Recovery 


A number of serious error conditions can prevent a processor from executing the current 
thread of software. Such error conditions are detected by PALcode and halt the processor. 


When a halt is encountered, the console must ensure that the processor hardware state is visi- 
ble to the console operator and to system software after a subsequent restart attempt. This state 
includes the current values in PS, PC, SP, PCBB, HWPCB, all integer registers, all float- 
ing-point registers, and the name ot the halt condition. The console must: 


1. Ensure that the contents of the integer and floating-point registers appear unaffected. 

2. Write the current hardware context to the HWPCB located by the current PCBB. 

3. Write the current PS, PC, and PCBB register contents into the processor’s per-CPU slot. 
4 


Write the current R25, R26, and R27 register contents into the processor’s per-CPU 
slot. 


5. Set the appropriate code into the Reason for Halt field of the processor’s per-CPU slot. 


The values of R25, R26, and R27 must be explicitly saved in the per-CPU slot to permit the 
console to invoke the CPU restart routine. 


Section 3.1.1 and Table 2-4 list the defined halt conditions that transition an Alpha processor 
from the running state to a halted state and that may lead to an attempt to restart the processor. 
Each condition is passed to the operating system in the Reason for Halt quadword of the pro- 
cessor’s HWRPB slot. 


When an error halt occurs, the console examines the console lock setting. If the console is 
locked, the console attempts a restart. If unlocked, the console action is determined by the set- 
ting of the AUTO_ACTION environment variable (see Figure 3-1). See Section 3.5.1 fora 
description of the restart attempt process. 


The processor must be initialized after an error halt. If the processor starts running after an 
error halt without an intervening processor initialization, the operation of the processor is 
UNDEFINED. The effects of processor initialization are summarized in Table 3-7. 
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An error halt directly affects only the processor that incurred it, although multiple processors 
may simultaneously and coincidentally incur their own error halt conditions. If restarts are 
enabled, each halted processor must be independently restarted by the console. The restarts of 
individual processors may occur in a different order than the error halts occurred, but if the 
console restarts any halted processor, it must restart all halted processors in a timely fashion 
unless a bootstrap is requested in the meantime. A bootstrap nullifies any pending restarts in 
the multiprocessor. 


3.5.5 Operator Requested Crash 


When the operating system does not respond to normal program requests, the console operator 
may request that the console request an operating system crash. A console requested crash dif- 
fers from a console halt of a processor in that system software can write a crash dump. 


The console operator interacts with the console presentation layer and requests the crash with 
a HALT —CRASH command. The console converts this command to an error halt restart of 
system software. After gaining control of the processor, the console preserves the hardware 
state (see Section 3.5.4). The console passes the crash request to system software by using the 
“Console Operator requests system crash" code in the Reason for Halt field in the primary’s 
per-CPU slot. It is the responsibility of the system software restart routine to initiate the crash 
in an implementation-specific fashion. 


3.5.6 Primary Switching 


System software may find it necessary to replace the primary processor with one of the run- 
ning secondary processors without bootstrapping the system. This "switch" of the running 
primary may be caused by an error encountered by the primary or by a program request. 
Switching a running primary must be initiated by system software; the console cannot force a 
switch to occur. 


Support for primary switching is optional to system software, console implementations, and 
system platforms. The system platform hardware must permit the selected secondary to 
assume the functions of a primary. The selected secondary must have direct access to the con- 
sole, a BB_WATCH, and all I/O devices. Direct access to the console ensures that the 
secondary can access console I/O devices and the console terminal. Direct access to a 
BB_WATCH ensures that the secondary can act as the system timekeeper. Direct access to all 
I/O devices ensures that the secondary can initiate I/O requests to and receive I/O interrupts 
from all I/O devices, and that the secondary can reinitialize all devices as part of powerfail 
recovery. 


If the processor is eligible to become a primary, the console will set the Primary Eligible (PE) 
processor variation flag in the processor’s per-CPU slot during processor initialization. See 
Table 2-4. 


Primary switching requires cooperation between system software and the console. System soft- 
ware is responsible for the selection of the new primary and any necessary redirection of I/O 
interrupts. The console is responsible for any necessary configuration of the console terminal 
or other console device interface. 
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3.5.6.1 Sequence on an Embedded Console 


The sequence of events differs depending on the type of console implementation. On a system 
with an embedded console, the operation proceeds as follows: 


1; 
2. 


10. 


11. 


12. 


LS; 


14. 
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System software performs any actions specific to system software synchronization. 


System software executing on the old primary ensures that the console terminal is in a 
quiescent state. In particular, character reception from the terminal must be suspended. 


System software selects the new primary. The selected secondary must be eligible as 
indicated by the PE processor variation flag in its per-CPU slot. 


System software executing on the old primary invokes the PSWITCH console callback 
specifying the "transition from primary" action. 

The console attempts to perform any necessary hardware state changes to transform the 
old primary into a secondary. 


Hardware/Software Coordination Note: 
An example of such a hardware state change is disabling a console UART 
physically located on the processor board. 


If the state change is completed, PSWITCH returns success status. System software 
may proceed with the primary switch at step 8. 


If the state change is not effected, PSWITCH returns failure status. System software 
must take other appropriate action. 


System software executing on the old primary notifies system software on the selected 
secondary of the successful PSWITCH completion. 


System software executing on the selected secondary invokes the PSWITCH console 
callback specifying the "transition to primary" action. 


The console verifies that the selected secondary is eligible to become a primary and 
attempts to perform any necessary hardware state changes to transform the old second- 
ary into the new primary. 


If the state change is completed, PSWITCH returns success status. System software 
may proceed with the primary switch at step 13. 


If the state change is not effected, PSWITCH returns failure status. System software 
must select a different potential primary or take other appropriate action. 


System software executing on the selected secondary reactivates the console terminal. 
In particular, character reception from the terminal is re-enabled. 


System software performs any additional system reconfiguration, updates the PRI- 
MARY CPU ID field at HWRPB[32], recomputes the HWRPB checksum at 
HWRPB[288], and performs any actions specific to system software synchronization. 
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On a system with a detached console, the operation is similar, but only one call to PSWITCH 
is required. Additional calls to PSWITCH with the "switch primary" action may result in 
UNDEFINED operation. The operation proceeds as follows: 


1. System software performs any actions specific to system software synchronization. 


2. System software executing on the old primary ensures that that the console terminal is 
in a quiescent state. In particular, character reception from the terminal must be sus- 
pended. 


3. System software selects the new primary. The selected secondary must be eligible as 
indicated by the PE processor variation flag in its per-CPU slot. 


4. System software executing on any processor invokes the PSWITCH console callback 
specifying the "switch primary" action and the CPU ID of the new primary. 


5. The console verifies that the selected secondary is eligible to become a primary and 
attempts to perform any necessary hardware state changes to transform the old primary 
into a secondary and to transform the selected secondary into the primary. 


6. If the state change is completed, PSWITCH returns success status. System software 
may proceed with the primary switch at step 9. 


7. If the state change is not effected and the resulting hardware state permits a return to 
system software, PSWITCH returns failure status. System software must select a differ- 
ent potential primary or take other appropriate action. 


8. If the state change is not effected and the resulting hardware state does not permit a 
return to system software, the console takes the action associated with a failed restart. 


9. System software executing on the selected secondary reactivates the console terminal. 
In particular, character reception from the terminal is re-enabled. 


10. System software performs any additional system reconfiguration, updates the PRI- 
MARY CPU ID field at HWRPB[32], recomputes the HWRPB checksum at 
HWRPB[288], and performs any actions specific to system software synchronization. 


3.5.7 Transitioning Console Terminal State During HALT/RESTART 


Abrupt transitions from program I/O mode to console I/O mode may occur. Such transitions 
may be caused by execution of a CALL_PAL HALT instruction, a catastrophic error, or a con- 
sole operator forcing the processor into console I/O mode. Upon transition to console I/O 
mode, the console must be able to regain control of the console terminal, even though system 
software may have changed the device characteristics. 


The console may seize control of the console terminal without regard to system software when 
the transition is such that no return to program I/O mode is possible. Such transitions are nor- 
mally associated with a catastrophic error. 


If system software execution may be continued, the console must be able to restore the exist- 
ing state of the console terminal. The console must regain and subsequently relinquish control 
of the console terminal with the cooperation of system software. 
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Hardware/Software Coordination Note: 


This is particularly desirable on workstations when the console operator forces the 
processor into console I/O mode. 


System software may provide SAVE_TERM and RESTORE_TERM routines that can be 
called by the console to save and restore the state of the console terminal. To provide these | 
optional routines, system software loads the SAVE_TERM and RESTORE_TERM starting 
virtual address and procedure descriptor fields in the HWRPB and recomputes the HWRPB 
checksum at HWRPB[288]. At system bootstraps, the console sets these fields to zero. 


The console calls SAVE_TERM and RESTORE_TERM in kernel mode at the highest IPL in 
the memory management policy established by system software. The console loads the routine 
procedure value into R27, clears R25 and R26, and then transfers control to system software at 
the starting virtual address. The procedure value and starting virtual address for SAVE_TERM 
are contained in HWRPB[224] and [232]; those for RESTORE_TERM are contained in 
HWRPB[240] and [248]. These routines are invoked only on the primary processor and only 
upon an unexpected entrv into console I/O mode. The console must preserve sufficient hard 
ware state to permit the processor to be restarted before invoking these routines. See Section 
3.5.4. 


Exit from these routines must be accomplished by using the CALL_PAL HALT instruction to 
return the processor to console I/O mode; these routines do not use the RET subroutine return 
instruction. Before exiting, these routines must set the "SAVE_TERM/RESTORE_TERM 
exit" code (‘1’) in the Halt Request field of the primary’s per-CPU slot and indicate success 
(‘0’) or failure (‘1’) status in RO<63>. The console will not attempt to continue system soft- 
ware if a failure status is returned. 


SAVE_TERM and RESTORE_TERM may be called when system software has encountered 
an unexpected CALL_PAL HALT or other halt condition; system state may be corrupt. These 
routines must be written with few or no dependencies on possibly corrupt system state. 


Hardware/Software Coordination Note: 


A console terminal on a serial line may or may not have state that needs to be saved. A 
console terminal on a workstation may require the system sofiware to “roii down” the 
current screen to expose the "console window" and "roll up" the "console window" to 
expose the current screen. 
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3.5.7.1 SAVE_TERM — Save Console Terminal State 


Format: 
status = SAVE_TERM 
Inputs: 
R27 = Procedure value (HWRPB[232]) 
Outputs: 
status = RO; status: 

RO<63> ‘0’ Success, terminal state saved 

of kee Failure, terminal state not saved 


RO<62:0> SBZ 


SAVE_TERM is called by the console after an unexpected entry to console mode. The routine 
performs any implementation-specific and device-specific actions necessary to save the state 
of the console terminal as established by system software. When the routine exits and console 
I/O mode is restored, the console is free to modify the existing console terminal state in any 
manner. 


3.5.7.2 RESTORE _TERM — Restore Console Terminal State 


Format: 
status = RESTORE _ TERM 
Inputs: 
R27 = Procedure value (HWRPB[248]) 
Outputs: 
status = RO; Status: 

R0<63> ‘0’ Success, terminal state restored - 

is Failure, terminal state not restored 


RO<62:0> SBZ 


RESTORE_TERM is called by the console just before continuing system software. The rou- 
tine performs any implementation-specific and device-specific actions necessary to restore the 
state of the console terminal as established by system software. 
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3.5.8 Operator Forced Entry to Console I/O Mode 


The console operator can force a processor into console I/O mode with a HALT -CPU com- 
mand. When a processor enters console I/O mode in this way, the console sets the Operator 
Halted (OH) flag in its per-CPU slot. The console does not update the Reason for Halt or any 
other processor halt state in its per-CPU slot. The console sets the OH flag only as the result of 
an explicit operator action. The OH flag is not set on transitions to console I/O mode that 
result from error halt conditions, powerfails, CALL_PAL HALT instructions in kernel mode, 
console operator requests of a system crash, or software-directed processor shutdowns. 


The console clears the OH flag before returning to program I/O mode as the result of a CON- 
TINUE or BOOT command. The console may clear OH flag if an error halt or 


operator-induced condition is encountered that precludes a subsequent CONTINUE command. 
Such a condition is treated as an error halt (see Section 3.5.4). 


3.6 Bootstrap Loading and Image Media Format 
An Alpha console may load a primary bootstrap image from one or more of the device classes 
listed in Table 3-9, Subsequent sections describe how the console locates, sizes, and loads the 
bootstrap image for each device class. 


Table 3—9: Bootstrap Devices and Image Media 


- Device Class Data Link Protocol 


Local Disk N/A Bootblock 

Local Tape N/A ANSI Bootblock 
Network NI, FDDI ‘MOP Bootp 
ROM N/A ROM Bootblock 


As explained in Section 3.4.1.4, the console attempts to load a bootstrap image from each ele- 
ment of a bootstrap device list until a successful image load is achieved. If the bootstrap image 
cannot be located or if the ioad fails for any reason, the console retains control of the system, 
generates the binary error message AUDIT_BSTRAP_ABORT, and then attempts to load a 
bootstrap image from the next bootstrap device list element. After a bootstrap image is suc- 
cessfully located and loaded, the console transfers control to system software as described in 
Section 3.4. 


As the loading of the bootstrap image proceeds, the console optionally generates an audit trail 
of progress messages. The ENABLE_AUDIT environment variable controls audit trail genera- 
tion. The audit trail begins with the AUDIT_BOOT_STARTS message. The audit trail 
continues with messages that are specific to the bootstrap device. Each consists of a binary 
“message code that is interpreted by the console presentation layer. 
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3.6.1 Disk Bootstrapping 


An Alpha primary bootstrap may be loaded from a directly accessed disk device. The console 
loads the "boot block" contained in the first logical block (LBN 0) of the disk. The boot block 
contains the starting logical block number (LBN) of the primary bootstrap program and the 
count of contiguous LBNs that make up that image. 


The first 512 bytes of the boot block are structured as shown in Figure 3-6. The console loads 
the primary bootstrap without knowledge of the operating system file system. The boot block 
is (previously) initialized by the operating system. The actual size of a logical block is 
device-specific and may exceed 512 bytes. 


Figure 3-6: Alpha Disk Boot Block 
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A local disk bootstrap proceeds as follows: 


iP 
2s 


The console reads the boot block from LBN 0 of the specified disk device. 


The console validates the boot block CHECKSUM; if the checksum is not validated, 
the bootstrap image load attempt aborts. The console computes the checksum of the 
first 63 quadwords in the block as a 64-bit sum, ignoring overflow. The computation 
includes both reserved regions. The computed checksum is compared to the CHECK- 
SUM. 


The console generates the AUDIT.CHECKSUM_GOOD message if the audit trail is 
enabled. 


The console ensures that the FLAG quadword is zero; otherwise the bootstrap image 
load attempt aborts. 


The console ensures that the COUNT is non-zero; otherwise the bootstrap image load 
attempt aborts. The count field indicates the number of contiguous logical blocks that 
contain the primary bootstrap. 


The console generates the AUDIT_.LOAD_BEGINS message if the audit trail is 
enabled. 


The console reads the primary bootstrap image specified by COUNT and STARTING 
LBN into system memory; in any error occurs, the bootstrap image load attempt aborts. 
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The transfer begins at the logical block given by the STARTING LBN; a contiguous 
COUNT number of logical blocks is read. The image is read into a virtually 
contiguous system memory buffer; the starting virtual address is 
0000 0000 2000 0000) ¢. (See Section 3.4.1.2.) 

Errors include device hardware errors, the specified STARTING LBN not being 
present on the disk, or unexpectedly encountering the last logical block on the disk 
during the read. 


8. The console generates the AUDIT_.LOAD_DONE message when the load has com- 
pleted; the message is generated only if the audit trail is enabled. 


9. The console prepares to transfer control to the bootstrap program as described in Sec- 
tion 3.4.1.6. . 


Implementation Notes: 


Unlike the VAX boot block support, no native Alpha code is contained in the boot block; 
the boot block contains only the LBN descriptor for the Alpha primary bootstrap image. 
An Alpha boot block can contain pointers to primary bootstrap images for both VAX and 
Alpha simultaneously. 


Because the boot block includes an LBN and block count, the console need have no 
-knowledge of the operating system file system or on-disk structure. 


The first 136 bytes of the boot block are currently used by the VAX disk boot block 
mechanism. The next 80 bytes are not currently used either by VAX or Alpha boot blocks. 
For future expansions, VAX boot blocks should expand towards higher addresses and 
Alpha boot blocks expand towards lower addresses; each region remains contiguous. 
These 216 bytes are ignored by the Alpha console except for the purposes of computing 
the boot block checksum. 


The boot block FLAGS word is reserved for future expansion. Flag<0> is reserved to 
indicate a discontiguous bootstrap image; Flag <63:1> are reserved for future definition. 
There are no current plans by any DIGITAL operating system to have a discontiguous 
primary bootstrap image. 


3.6.2 Tape Bootstrapping 


An Alpha primary bootstrap may be loaded from a directly accessed tape device. Before load- 
ing the primary bootstrap, the console must determine the tape format and locate the primary 
bootstrap on the tape. The console: 


1. Rewinds the tape on the specified tape device to the beginning of the tape (BOT). 
2. Reads the first record. 
3. Determines the record length. 


— Ifthe record length is 80 bytes, the tape may be an ANSI-formatted tape. The con- 
sole proceeds as described in Section 3.6.2.1. 
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— Ifthe record length is 512 bytes, the tape is "boot blocked." The console proceeds 
as described in Section 3.6.2.2. 


— Ifthe length is other than 80 or 512 bytes, the bootstrap image load attempt aborts. 


3.6.2.1 Bootstrapping from ANSI-Formatted Tape 


Before loading the primary bootstrap image from an ANSI-formatted tape, the console must 
ensure that the format is valid. To verify that a given record contains a particular ANSI label, 
the console checks for the ASCII label name string at the beginning of the record. For exam- 
ple, a record containing a VOL1 label begins with the ASCII string "VOL1." All other record 
bytes are ignored when verifying the label. 


A primary bootstrap image file name may be specified explicitly on a BOOT command or 
implicitly by the BOOT_FILE environment variable. If no file name is specified, the first file 
located will be used. 


A local ANSI-formatted tape bootstrap proceeds as follows: 


1. 


The console verifies that the first record contains a VOL1 label; if the verification fails, 
the bootstrap image load attempt aborts. 


The console generates the AUDIT_TAPE_ANSI message if the audit trail is enabled. 


If no file name was specified, the console advances the tape position to the End-of-Tape 
(EOT) side of the the first tape mark. The console proceeds to step 5. 


If a file name was specified, the console attempts to locate that file on the tape. If the 
file cannot be located, the attempt to load the bootstrap image aborts. The console com- 
pares the specified file name with the file name present in each HDR1 label on the tape. 
At the first match, the console proceeds to step 5. 


The console searches for the specified file, starting with the second tape record. The 
console reads 80-byte records from the tape until it encounters an HDR1 label, then 
proceeds as follows: 


A. The console generates the AUDIT_FILE_FOUND<filename> message, where 
<filename> is the value of the HDRI1 label. The message is generated only if the 
audit trail is enabled. 


B. The console compares the specified file name with the 17-character File Identifier 
Field found in the HDR1 label. 


C. Ifamatch occurs, the console advances the tape position to after the next tape mark 
and proceeds to step 5. (Any HDR2 or HDR3 labels are ignored.) 


D. If no match occurs, the console advances the tape position over the next three tape 
marks and reads the next record. If another tape mark is found, the logical end of 
volume has been encountered and the attempt to load the bootstrap image aborts. 
Otherwise, the record should be the HDR1 label for the next file on the tape and the 
console proceeds at step A. 


The console aborts the attempt to load the bootstrap image whenever an unexpected 
tape mark is encountered, the tape runs off the end, or a hardware error occurs. 


The console. generates the AUDIT_LOAD_BEGINS message if the audit trail is 
enabled. 
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The console reads the primary bootstrap image from tape into system memory; if any 
error occurs or if the tape runs off the end, the attempt to load the bootstrap image 
aborts. 


The transfer from tape begins at the current tape position and continues until a tape 
mark is encountered. The image is read into a virtually contiguous system memory 
buffer; the starting virtual address is 0000 0000 2000 00001¢. (See Section 3.4.1.2.) 


The console checks that the bootstrap file was properly closed by: 


A. Reading the record after the tape mark and verifying that the record is an EOF1 
label. If not, the attempt to load the bootstrap image aborts. 


B. Searching for a subsequent tape mark. If a tape mark is not found, the bootstrap file 
was improperly closed and the attempt to load the bootstrap image aborts. (Any 
EOF2 and EOF3 labels are ignored.) 


The console generates the AUDIT_LOAD_DONE message if the audit trail is enabled. 


The console prepares to transfer control to the bootstrap as described in Section 3.4.1.6. 
The console does not rewind or otherwise change the position of the tape after reading 
the bootstrap image. 


3.6.2.2 Bootstrapping from Boot-Blocked Tape 


Bootstrapping from a boot-blocked tape is similar to the local disk bootstrapping described in 
Section 3.6.1. The first tape record must be 512 bytes and must follow the format given for 
disk boot blocks as shown in Figure 3-6. The STARTING LBN and FLAGS fields are MBZ 
for tape boot boot blocks. 


All tape records that comprise the primary bootstrap must be 512 bytes in size. If the console 
encounters records of any other size, the attempt to load the bootstrap image aborts. 


A local tape boot block bootstrap proceeds as follows: 


ie 


The console generates the AUDIT_TAPE_BBLOCK message if the audit trail is 
enabled. | 


The console validates the boot block CHECKSUM; if the checksum is not validated, 
the attempt to load the bootstrap image aborts. The console computes the checksum of 
the first 63 quadwords in the block as a 64-bit sum, ignoring overflow. The computation 
includes both reserved regions and the MBZ fields. The computed checksum is com- 
pared to the CHECKSUM at [BB+504]. 


The console generates the AUDIT_CHECKSUM_GOOD message if the audit trail is 
enabled. 


The console ensures that the COUNT is non-zero; otherwise the attempt to load the 
bootstrap image aborts. The count field indicates the number of subsequent 512-byte 
records that contain the primary bootstrap. 


The console generates the AUDIT_.LOAD_BEGINS message if the audit trail is 
enabled. 


The console reads the count field subsequent records from the tape into system mem- 
ory. The attempt to load the bootstrap image aborts if the console encounters any error, 
encounters any record size other than 512 bytes, or the tape runs off the end. 
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The image is read into a virtually contiguous system memory buffer; the starting 
virtual address is 0000 0000 2000 0000, ¢. (See Section 3.4.1.2.) 
7. The console generates the AUDIT_.LOAD_DONE message if the audit trail is enabled. 


8. The console prepares to transfer control to the bootstrap as described in Section 3.4.1.6. 
The console does not rewind or otherwise change the position of the tape after reading 
the bootstrap image. 


3.6.3 ROM Bootstrapping 


An Alpha console may support bootstrapping from read-only memory (ROM). Bootstrap 
ROM is assumed to appear in multiple discontiguous regions of the physical address space. A 
given ROM region may contain multiple bootstrap images. A given bootstrap image must not 
span ROM regions. 


Each ROM bootstrap image is page aligned and begins with a boot block as shown in Figure 
3-7. The ROM boot block is similar to the local disk and tape boot block shown in Figure 3-6. 


Figure 3-7: Alpha ROM Boot Block 
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A ROM bootstrap proceeds as follows: 


1. The console locates the specified ordinal ROM bootstrap image; if the bootstrap image 
cannot be located, the attempt to load the bootstrap image aborts. 


The console locates the ROM bootstrap image by searching ROM regions beginning 
with the ROM region with the lowest physical address and proceeding upward to the 
ROM region with the highest physical address. 

The search proceeds as follows: 

A. The console verifies that the page contains a ROM bootstrap image: 


— The low-order byte of the first quadword must be 80)¢. 


— The high-order longword of the first quadword must be the one’s complement 
of the low-order longword. 


— The sixth quadword must contain the checksum of the first five quadwords. 
The checksum is computed as a 64-bit sum, ignoring overflow. 
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8. 


B. The console generates the AUDIT_BOOT _TYPE<string> message for each valid 
boot block, if the audit trail is enabled. The <string> is the ISO Latin—1 string con- 
tained in the BOOTSTRAP ID quadword. 


C. If the specified ordinal image number has been reached, the console proceeds to 
step 2. 


D. Otherwise, the console uses the IMAGE LENGTH at [BB+24]:to determine the off- 
set to the next ROM region page to be searched. The console repeats the process at 
step A. 


The console computes the starting physical address of the bootstrap image by adding 
the physical address OFFSET at [BB+16] to the starting physical address of the boot 
block [BB]. 


The console verifies the accessibility of each page of the bootstrap image. If any page is 
inaccessible, the attempt to load the bootstrap image is aborted. 


The console generates the AUDIT_BSTRAP_ACCESSIBLE message if the audit trail 
is enabled. 


a aladatag tha INAAOAL OTICDOV ATTN 2.4 TMD NO 
If requested, the console validates thc IMAGE CHECKSUM at [BB+08}]; if the check- 


sum is not validated, the attempt to load the bootstrap image aborts. The console com- 
putes the checksum of all quadwords in the bootstrap image as a 64-bit sum, ignoring 
overflow. The existence and implementation of the mechanism for requesting this vali- 
dation is implementation specific. 


The console generates the AUDIT_BSTRAP_GOOD message if the audit trail is 
enabled. 


If requested, the console copies the bootstrap image from ROM into system memory 
(RAM). The image is copied into a virtually contiguous buffer starting at virtual 
address 0000 0000 2000 0000,¢. (See Section 3.4.1.2.) The console generates the 
AUDIT_LOAD_BEGINS'~ message before beginning the copy and _ the 
AUDIT_LOAD_DONE after the copy completes successfully if the audit trail is 
enabled. 


The console prepares to transfer control to the bootstrap as described in Section 3.4.1.6. 


3.6.4 Network Bootstrapping 


An Alpha system may support bootstrapping over one or more network communication 
devices and data link protocols. The console actions depend on the network device, data link 
protocol, and remote server capabilities. 


An Alpha system can use the DIGITAL Network Architecture Maintenance Operations Proto- 
col (MOP), or the BOOTP-UDP/IP network protocol, to bootstrap an Alpha system. See the 
MOP or BOOTP-UDP/IP specification for a detailed description. 


A network bootstrap proceeds as follows: 


1. 


The console determines if a bootstrap file name is to be used. The file name is taken 


la TH £51 mai 
from the BOOT command or the BOOT_FILE environment variable. If no file name is 


specified on the BOOT command and BOOT_FILE is null, no file name will be used. 


The console generates the AUDIT_BOOT_REQ<filename> message if the audit trail is 
enabled. 
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10. 


The console issues the appropriate (MOP or BOOTP-UDP/IP) bootstrap request mes- 
sage(s). 
The console receives an appropriate response (MOP or BOOTP-UDP/IP) from a 


remote bootstrap server. If no such response is received, the attempt to load the boot- 
strap image abortss. 


The console generates the AUDIT_BSERVER__ FOUND message if the audit trail is 
enabled. 


The bootstrap load proceeds, using the appropriate network protocol. 


When the console receives the first portion of the bootstrap image, the console gener- 
ates the AUDIT_LOAD_BEGINS message if the audit trail is enabled. 


The console loads the initial portion of the bootstrap image into a virtually contiguous 
system memory buffer; the starting virtual address is 0000 0000 2000 0000, ¢. (See Sec- 


tion 3.4.1.2.) 


When. the bootstrap image has been loaded, the console generates the 
AUDIT_LOAD_DONE message if the audit trail is enabled. 


The console prepares to transfer control to the bootstrap program as described in Sec- 
tion 3.4.1.6. 


If any error occurs, the attempt to load the bootstrap image aborts. 


3.7 BB_WATCH 


The following list offers important points about BB_WATCH: 


ie 


BB_WATCH is the correct name for this entity. Although incorrect terminology, TOY, 
TODR, and watch chip, when used in the context of an Alpha system, are equivalent in 
meaning to the BB_WATCH. 


System software must directly manipulate the BB_WATCH through an implementa- 
tion-dependent interface. 


System software makes the decision where to acquire known time; if a BB_ _WATCH is 
present, it may be used as the provider of known time. 


Systems are not required to have a BB_WATCH. 


Software Note: 
However, all systems that support OpenVMS Alpha or DIGITAL UNIX on Alpha 
must have a BB_WATCH. 


If a BB_WATCH is present in a system, it meets the following requirements: 


— Ithas an accuracy of at least 50 ppm regardless of whether power is applied to the 
system. 


—  Ithas a resolution of at least 1 second (that is, it is read and written in units of a sec- 
ond or better). 


— Changing the entirety of the time maintained by the BB_WATCH takes under 1 sec- 
ond. 


—  Ithas battery backup to survive a loss of power. 
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6. ABB_WATCH is always accessible to the primary processor. That is, a processor must 
be able to access a BB_WATCH directly (it must not need to go through another proces- 
sor to access it) in order to be a candidate for primary processor. 


7. The number of BB_WATCH entities in a system is either one for the entire system or 

_ one for each processor in the system; which of the two options a system chooses is 

implementation dependent. If the latter option is chosen (one BB_WATCH per proces- 
sor), writing one BB_WATCH does not update another. 


8. Although writing the BB_WATCH takes less than one second, it may not be a fast oper- 
ation. Software should avoid frequently writing the BB_WATCH lest it negatively 
impact performance. 


9. The processor and its PALcode never changes the value of BB_WATCH except under 
the direction of system software. (The console, boot programs, and remote console cli- 
ents are not system software.) The console, its PALcode, and any console application 
(including a diagnostic supervisor) never changes BB_WATCH except under the direc- 
tion of the console operator — even when the CPU is halted, the processor is being ini- 
tialized, or the BB_WATCH has an invalid time. 


Programming Note: 


The Primary-Eligible (PE) bit in the per-CPU slot of the HWRPB for each processor 
indicates, among other things, whether the CPU has access to a BB_WATCH. See 
Chapter 2. 


The description of primary switching details the actions taken in a multiprocessor system, 
including the requirement for the primary processor to have access to the BB_LWATCH. 


3.8 Implementation Considerations 


3.8.1 Embedded Console 


In an embedded console implementation, the console executes on the same processor as the 
operating system. In such an implementation, the state transitions as experienced by the pro- 
cessor are more conceptual. For example, the processor acting as the console will be executing 
instructions when in the halted state. The processor may also field console I/O mode excep- 
tions and interrupts. 


An embedded console may be implemented as an extension of PALcode or as a distinct soft- 
ware entity. The console may execute from dedicated RAM or ROM on the processor or, after 
console initialization, may execute from main memory. 


An embedded console implementation must include a mechanism by which the primary pro- 
cessor can be forced into console I/O mode from program I/O mode. This enables the console 
operator to gain control of the system regardless of the state of the system software. See Sec- 
tion 1.1 for recommended and required mechanisms. 


3.8.1.1 Multiprocessor Considerations 


In a multiprocessor system, selection of the primary processor occurs before any access to 
main memory by any of the processors. At system cold start, each of the processors will be 
executing in console I/O mode. The necessary memory for console execution must be indepen- 
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dent of main memory; the console must be executing from dedicated console RAM or ROM 
and/or a suitably configured processor cache. 


The selection of the console primary requires one or more hardware registers with state that is 
shared by all processors. One possible example is a mutex contained in a single-bit register 
accessed only with LDQ_L/STQ_C instructions. The primary successfully gains ownership of 
the mutex. Implementations should include mechanisms for operator override of the selection 
process and for recovery if the selection process fails. 


Once a console primary has been selected, the console secondaries take no further action until 
appropriately notified by the primary. In particular, console secondaries must not access main 
memory. The console primary is responsible for building the HWRPB and any console-inter- 
nal data structures (such as environment variables) for the secondaries. When these structures 
have been initialized, the console primary must be able to signal one or more of the secondar- 
ies by additional hardware register(s). 


The console primary allocates a HWRPB in main memory, initializes it, and stores its physical 
address in an implementation-specific, nonvolatile manner. The console primary then indi- 
cates the presence of the HWRPB and its location to all secondaries by an 
implementation-specific mechanism. 


On system restarts, the console primary identifies itself by comparing its WHAMI register con- 
tents with the Primary CPU ID value stored in the HWRPB. 


When executing in console I/O mode, all processors must observe the same values of all con- 
sole environment variables. The values of the AUTO_ACTION and BOOT_RESET 
environment variables are particularly important. After failing to become the console primary 
processor, a console secondary waits to be notified that a valid HWRPB exists. Upon such 
notification by the primary, the console secondaries use the address provided by the primary to 
locate the HWRPB. The primary may be in either program I/O mode or console I/O mode. 


On cold bootstrap, a console secondary must not access main memory until notified by the pri- 
mary that a valid HWRPB exists. Thus, there must exist a mechanism that is not based on 
main memory whereby the primary may signal each of the secondaries. On warm bootstrap or 
restart, a secondary processor must locate its per-CPU slot in the HWRPB and poll its 
RXRDY bit. 


Console processors must locate the HWRPB without searching memory; such a search consti- 
tutes a security hole. One possible implementation is to use an environment variable or other 
shared console data structure. The address of the HWRPB must be nonvolatile across power 
failures in systems that support powerfail recovery. 


Console implementations that support SAVE_ENV must be able to execute the routine simul- 
taneously on each processor. System software use of SAVE_ENV requires care. System 
software must invoke SAVE_ENV on all available processors, but cannot ensure that the non- 
volatile storage is updated on processors that are not available at the time of update. If 
mismatch occurs, the console uses the nonvolatile values preserved by the primary processor. 
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3.8.2 Detached Console 


In a detached console implementation, the console executes on a separate and distinct hard- 
ware platform. A detached console may have cooperating special code that executes on one of 
the processors in the system configuration. 


Detached console implementations should provide a keep-alive function. System software 
should be able to detect failures of the path between the system platform and the console. The 
mechanism may be a single dedicated signal or periodic message exchange. System software 
should be able to continue to execute if a keep-alive failure occurs, and restoration of the con- 
nection (or console state) should not cause a system crash or other major state transition. The 
console should buffer any messages if a keep-alive failure occurs until reconnection occurs. 


Detached consoles may maintain a local console log. The logging device and format are imple- 
mentation specific. | 
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3.9 \Revision History 


Revision 7.0, November 10, 1997 


1. 


2 
3 
4. 
5 


Added ECO 99, extended VA 
Removed AXP 

DEC OSF/1 ——> Digital UNIX 

Digital —-> DIGITAL 

OpenVMS AXP ——> OpenVMS Alpha 


Revision 6.0, December 12, 1994 


1. 


Se cE gay ea 


Alpha —-> Alpha AXP 

Add ECO 59, PALcode base address change 

Add ECO 56, DSRDB in HWRPB 

Add ECO 55, MCES intitialization state change | 

Add ECO 54, Additional memory cluster descriptor type 
Add ECO 39, PALcode switching 

Add ECO 72, Halt action in states and state transitions 


Revision 5.0, May 12, 1992 


1. 


gee, ee 2 Oe NE ge 


Removed references to ELN 

ULTRIX ——> DEC OSF/1 

Widget ——> device 

Added ECO 30 text part 

Material rearranged according to SRM Rev 5 requirements 
Added ECO 17, 23 

Converted to SDML. 

Replace previous Console Chapter with Console ECO 15 


Includes 3 chapters and two appendices, renumber I/O Chapter 


10. Material substantially changed or rearranged\ 
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A 
Address space number (ASN) register 
at processor initialization, 3-20 
in initial HWPCB, 3-21 
with PALcode switching, 3-8 
AST enable (ASTEN) register 
at processor initialization, 3-20 
in initial HWPCB, 3-21 
AST summary (ASTSR) register 
at processor initialization, 3-20 
in initial HWPCB, 3-21 
AUTO_ACTION environment variable, 2-27 
overriding, 3-27 
state transitions and, 3-1 
with cold bootstrap, 3-9 
with error halts, 3—30 
with system restarts, 3-28 





BB_WATCH 
at power-up initialization, 3-4 
requirements, 3—43 
with powerfail interrupts, 3-28 
with primary console switching, 3-31 
with primary-eligible (PE) bit, 3-44 
BITMAP_CHECKSUM, memory cluster field, 
3-13 
BITMAP_PA, memory cluster field, 3-13 
BITMAP_VA, memory cluster field, 3-13 
Boot block on disk, 3-37 


BOOT_DEV environment variable, 2—27 

with loading system software, 3-19 
BOOT_FILE environment variable, 2—27, 3-39 

with loading system software, 3-19 
BOOT_OSFLAGS environment variable, 2-28 

with loading system software, 3—19 
BOOT_RESET environment variable, 2-28 

at system initialization, 3-3 

at warm bootstrap, 3-22 

overriding, 3-27 
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with cold bootstrap, 3-9 
BOOTDEF_DEV environment variable, 2—27 

with loading system software, 3-19 
BOOTED_DEV environment variable 

with loading system software, 3-19 
BOOTED_FILE environment variable, 2—28 

with loading system software, 3-19 
BOOTED_OSFLAGS environment variable, 2—28 

with loading system software, 3-19 
BOOTP-UDP/IP network protocol, 3-42 
Bootstrap address space, regions, 3-13 
Bootstrap-in-progress (BIP) flag 

at multiprocessor boot, 3-23 

at power-up initialization, 3-4 

at processor initialization, 3-20 

per-CPU state contains, 2-22 

state transitions and, 3-1 


with failed bootstrap, 3-18 

with secondary console, 3-26 
Bootstrapping, 3-1 

adding processor while running system, 3-26 

address space at cold, 3-13 

boot block in ROM, 3-41 

boot block on disk, 3-37 

cold in uniprocessor environment, 3-9 

control to system software, 3-21 

failure of, 3-18 

from disk, 3-37 

from magtape, 3-38 

from MOP-based network, 3-42 

from ROM, 3-41 

implementation considerations, 3—44 

loading page table space at cold, 3-14 

loading primary image, 3-36 

loading system software, 3-18 

MEMC table at cold boot, 3-12 

multiprocessor, 3-23 

PALcode loading at cold, 3-13 

processor initialization, 3-20 _ 

request from system software, 3~—27 

state flags with, 3-18 

system, 3-3 

unconditional, 3-27 

warm, 3-22 
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C 


CHAR_SET environment variable, 2—29 


Characters 


getting from console, 2-36 
writing to console terminal, 2—40 


Checksum, HWRPB field for, 2-10 
at multiprocessor boot, 3~-23 
Clock. See BB_WATCH 


CLOSE device routine, 2—48 

Clusters, memory, 3-10 

CONFIG block, in HWRPB, 2-11 
CONFIG offset, HWRPB field for, 2-9 
CONHIG. See Configuration data block 
Configuration data block, 2-23 
Console 
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console I/O mode, 3-3 

data structure linkage, 2-69 

data structures loading at cold boot, 3-13 
definition, 1-1 

detached, 1-2 

detached implementations of, 3-46 
embedded, 1-2 

embedded implementation of, 3-44 
error halt and recovery, 3-30 

forcing entry to /O mode, 3-36 
HWRPB with, 2-1 

implementation registry, 1-2 
implementations, 1-2 

inter-console communications buffer, 2—76 
internationalization, 1-4 
interprocessor communications for, 2-75 
ISO Latin-1 support with, 1-4 

loading PALcode, 3-13 

loading system software, 3-18 

lock mechanisms, 1-2 

major state transitions, 3-2 

messages for, 1-3 

miscellaneous routines, 2-63 
multiprocessor boot, 3-23 
multiprocessor implementation of, 3-44 
presentation layer, 1-3 

processor state flags, 3-18 

program I/O mode, .3-3 

remapping routines, 2—71 

required environment variables, 2-27 
requirements for, 1-2 

resetting, 2-42 

RESTORE_TERM routine, 3-34, 3-35 
SAVE_TERM routine, 3-34, 3-35 
secondary at multiprocessor boot, 3-25 
security for, 1-4 

sending commands to secondary, 2-77 
sending messages to primary, 2-78 
supported character sets, 2—30 
switching primary processors, 2-64 
with system restarts, 3-27 
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Console callback routine block, in HWRPB, 2-10 


Console callback routines, 2—31 
at cold boot, 3-13 
CTB describes, 2-73 
data structures for, 2-69 
fixing up the virtual address, 2-63 
HWRPPB field for, 2-8 
remapping, 2-71 
summary of, 2-32 
system software invoking, 2-32 
Console environment variables 


loading system software, 3-19 
See also Environment variables 


Console I/O mode, 3-3 

_ forcing entry to, 3-36 
Console initialization mode, 3-3 
Console interface, 2-1 

Console routine block (CRB), 2-69 


console callback routines with, 2 69 
initializing, 2-71 

offset, HWRPB field for, 2-8 
structure of, 2-69 


Console terminal block (CTB) 


console callback routines with, 2-69 
described, 2-33, 2-73 

HWRPB fields for, 2-8 

number, HWRPB field for, 2-8 
offset, HWRPB field for, 2-8 

size, HWRPB field for, 2-8 
structure of, 2-74 


Console terminal routines, 2—33 


Context valid (CV) flag 


at multiprocessor boot, 3-23 
at processor initialization, 3-20 
per-CPU state contains, 2—22 


CPU ID, HWRPB field for primary, 2-6 
at multiprocessor boot, 3-23 
CPU slot offset, HWRPB field for, 2-8 


CTB tabie, in HWRPB, 2-10 

CTB. See Console terminal block 

Current PALcode, 3-5 

Cycle counter frequency, HWRPB field for, 2-7 


D 


Data stream translation buffer (DTB), 2-14 
Detached console, 1-2 

DEVICE ID, CTB field for, 2-75 
DEVICE TYPE, CTB field for, 2-75 
Device-specific data (DSD), 2-75 

Disk bootstrap image, 3-37 

DISPATCH procedure, 2-70 
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DISPATCH, CRB fields for, 2-70 

DSD LENGTH, CTB field for, 2-75 

DSD, CTB field for, 2-75 

DSRDB block, in HWRPB, 2-11 

DSRDB offset, HWRPB field for, 2-10 

DSRDB structure, 2—23 

DTB. See data stream translation buffer 
DUMP_DEV environment variable, 2-28 

Dynamic system recognition data block. See DSRDB 


E 


Embedded console, 1-2 

ENABLE_AUDIT environment variable, 2-29, 
3-36 

ENTRY, CRB field for, 2—71 


Environment variables, 2—25 
at power-up initialization, 3-4 
at processor initialization, 3-20 
getting, 2-58 
resetting, 2-59 
routines described, 2-57 
saving, 2-60 
setting, 2-62 

Error halt and recovery, 3-30 

Error messages, console, 1-3 


Executive stack pointer (ESP) register 
in initial HWPCB, 3-21 
Extended VA size, HWRPB field for, 2-7 


F 


Field replaceable unit (FRU) 
offset, HWRPB field for, 2—9 
table description, 2-23 
table, in HWRPB, 2-11 
FIXUP console routine, 2-63 
procedure descriptor for, 2—70 
using, 2-72 
with PALcode switching, 3-7 
_ Floating-point enable (FEN) register 


at processor initialization, 3-20 

in initial HWPCB, 3-21 

with PALcode switching, 3-8 
Floating-point registers 

with PALcode switching, 3-8 
FRU. See Field replaceable unit 


G 


GET_ENV variable routine, 2—58 
GETC terminal routine, 2-36 
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ISO Latin-1 support and, 1-4 
GH. See Granularity hint 


Granularity hint (GH) 
block in HWRPB, 2-14 
fields in, 2-14 


H 


HALT (PALcode) instruction 
state transitions and, 3-1 
Halt PCBB register, per-CPU slot field for, 2-19 


Halt processor, per-CPU slot fields for, 2-19 


Halt requested, per-CPU state flag, 2-21 
at multiprocessor boot, 3-23 
Hardware privileged context block (HWPCB) 
at cold boot, 3-21 
at warm boot, 3-22 
Hardware restart parameter block (HWRPB), 2-1 
fields for, 2-6 
loading at cold boot, 3-13 
overview of, 2-2 
size field in, 2-6 
structure of, 2-4 
with cold boot, 3-9 


HWRPB. See Hardware restart parameter block 


I/O device registers, at power-up initialization, 3-4 


I/O devices 


closing generic for access, 2-48 
device-specific operations for, 2-49 
generic routines for, 2-46 
opening generic for access, 2-51 
reading from generic, 2-53 
required implementation support for, 2-51 
writing to generic, 2—55 
Instruction stream translation buffer (ITB), 2-14 


Integer registers, with PALcode switching, 3-8 





Interprocessor console communications, 2—75 


Interrupt priority level (IPL) 


at processor initialization, 3-20 
with PALcode switching, 3-8 


Interval clock interrupt 


HWRPB field for, 2-7 
IOCTL console device routine, 2—49 


ISO Latin-1 support, 1-4 
PROCESS_KEYCODE and, 2-38 
ITB. See Instruction stream translation buffer 


K 


Kernel stack pointer (KSP) register 
at processor initialization, 3-20 
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in initial HWPCB, 3-21 
with PALcode switching, 3-8 
Keycode, translating, 2-38 


L 


LANGUAGE environment variable, 2-29 
Languages, supported by console, 2~—30 
LICENSE environment variable, 2—29 


Logout area 

length, per-CPU slot field for, 2-18 

physical address, per-CPU slot field for, 2-18 
LURT table, 2-24 


M 


Machine check error summary (MCES) register 
at processor initialization, 3-20 
with PALcode switching, 3-8 

Magtape bootstrap image 


ANSI format, 3-39 
boot blocked, 3—40 


Major modes, 3-3 











Major state transitions, 3-2 
console rules for, 3-2 
Major states, 3-1 
Maximum ASN value, HWRPB field for, 2-7 
MEMC. See Memory cluster descriptor 
MEMDSC. See Memory data descriptor table 
Memory cluster descriptor (MEMC) table 
structure of, 3-12 
Memory clusters, 3—10 


Memory data descriptor (MEMDSC) table 


at warm boot, 3-22 
in HWRPB, 2-11 


~ ot eee es an 
offset, HWRPB field for, 2-8 


structure of, 3-12 
with cold boot, 3-10 


Memory sizing at cold boot, 3-10 
MOP-based network bootstrapping, 3-42 


Multiprocessor bootstrapping, 3-23 
primary processor, 3-23 
Multiprocessor environment 
booting, 3-23 
console requirements, 2-26 


N 


Network bootstrapping, 3-42 
New PALcode, 3-5 
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O 


OPEN device routine, 2-51 
determines WRITE characteristics, 2—56 
Operator halted (OH) flag, 3-36 


at multiprocessor boot, 3-23 
per-CPU state contains, 2-22 





p 


Page size, HWRPB field for, 2-6 


Page table base (PTBR) register 

at processor initialization, 3-20 

in initial HWPCB, 3-21 

with PALcode switching, 3-8 
Page table entry (PTE) 

calculating at cold boot, 3-16 
Page table space, loading at cold boot, 3-14 
Page tabies 

calculating base, 3-16 

initial mapping at cold boot, 3-16 
PAGES, CRB field for, 2-70 


PALcode 
current defined, 3-5 
identifying the image, 3-5 
initialization of, 3-4 
loading, 3-4 
loading at multiprocessor boot, 3-23 
new defined, 3-5 
switching, 3-5 
switching at multiprocessor boot, 3—24 
variants at loading, 3-4 
variants at multiprocessor boot, 3-24 
variants at processor initialization, 3-21 


PALcode available, per-CPU slot field for, 2-20 


PALcode loaded (PL) flag, 3-4 
at multiprocessor boot, 3-23 
per-CPU state contains, 2~—21 


PALcode loading at bootstrap, 3-13 


PALcode memory space 
length of, 2-17 
physical address of, 2-17 
with PALcode loading, 3-4 


PALcode memory valid (PMV) flag 


at multiprocessor boot, 3-23 
per-CPU state contains, 2-22 
with PALcode loading, 3-4 


PALcode revision, per-CPU slot field for, 2-17 
with PALcode switching, 3-6 

PALcode scratch space 
length of, 2-17 


physical address of, 2—17 
with PALcode loading, 3-4 


PALcode scratch value, in initial HWPCB, 3-21 
PALcode valid (PV) flag 
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per-CPU state contains, 2-22 
with PALcode loading, 3-4 


PALcode variation 2, 3-8 
PC halted, per-CPU slot fields for, 2-19 


Per-CPU slots 


block for, 2-10 

fields for, 2-17 

in HWRPB, 2-15 

number, HWRPB field for, 2-8 

size, HWRPB field for, 2-8 

state flags at multiprocessor boot, 3-23 
state flags in, 2-21 

with PALcode switching, 3-7 


Physical address size, HWRPB field for, 2-6 


Powerfail and recovery 
multiprocessor type of, 3-29 
split type of, 3-29 
uniprocessor type of, 3-28 
united type of, 3-29 
Powerfail restart (PR) flag 
powerfail and recovery, 3-29 
Power-up initialization, 3-3 
Primary bootstrap image 
format of, 3-36 
loading at cold, 3-14 
Primary processor 
at multiprocessor boot, 3-23 
definition of, 1-1 
modes for, 3-3 
running at multiprocessor boot, 3-25 
switching from, 3-31 
Primary-eligible (PE) bit 
at multiprocessor boot, 3-23 
with BB_WATCH, 3-44 
with console switching, 3-31 
Privileged context block base (PCBB) register 
at processor initialization, 3-20 
with PALcode switching, 3-8 
PROCESS_KEYCODE console terminal routine, 
2-38 
Processor 
adding to running system, 3-26 
states and modes, 3-1 
Processor available (PA) flag 
at multiprocessor boot, 3-23 
per-CPU state contains, 2—22 
Processor cycle counter (PCC) register 
in initial HWPCB, 3-21 
Processor initialization, 3—20 
Processor modes, 3-3 


Processor present (PP) flag 


at multiprocessor boot, 3—23 
per-CPU state contains, 2—22 


Processor status (PS) register 
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at processor initialization, 3-20 
with PALcode switching, 3-8 


Processor unique value, 3-8 
Processor unique value (unique) register 


in initial HWPCB, 3-21 

with PALcode switching, 3-8 
Processor, per-CPU slot field for 

halt, 2-19 

revision, 2-18 

serial number, 2-18 

software compatibility, 2-20 

type, 2-18 

variation, 2-18 
Processors, switching primary, 2—64 
Program counter (PC) register 

with PALcode switching, 3-8 
Program I/O mode, 3-3 
PSWITCH console routine, 2-64, 3-32 


PUTS console terminal routine, 2—40 


R 


READ device routine, 2-53 
Reason-for-halt code 

at power-up initialization, 3-4 
Regions, bootstrap address space, 3-13 
RESET_ENV variable routine, 2-59 
RESET_TERM console terminal routine, 2-42 
RESTART RTN VA, HWRPPB field for, 2-9 
RESTART value, HWRPB field for, 2—9 


Restart-capable (RC) flag 


at multiprocessor boot, 3-23 

at processor initialization, 3-20 
per-CPU state contains, 2—22 
state transitions and, 3-1 

with failed bootstrap, 3-18 
with secondary console, 3-26 


RESTORE_TERM console routine, 3-34, 3-35 
RESTORE_TERM RTN VA, HWRPB field for, 2-9 
RESTORE_TERM value, HWRPB field for, 2-9 
Revision, HWRPB field for, 2-6 

ROM boot block structure, 3-41 

ROM bootstrapping, 3-41 

RX BUFFER, field in RXTX buffer area, 2-77 
RXLEN, field in RXTX buffer area, 2—77 

RXRDY bitmask, HWRPB field for, 2—10 


RXRDY flag, 2-75 
at multiprocessor boot, 3-23 
RXTX buffer area, 2-76 
per-CPU slot field for, 2-20 
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Ss 


SAVE_ENV variable routine, 2-60 
SAVE_TERM console routine, 3-34, 3-35 
SAVE_TERM RTN VA, HWRPB field for, 2-9 
SAVE_TERM value, HWRPB field for, 2-9 


Secondary processors 


at multiprocessor boot, 3-23 
definition of, 1-1 
modes for, 3-3 

SET_ENV variable routine, 2-62 


SET_TERM_CTL terminal console routine, 2-43 
SET_TERM_INT console terminal routine, 2~—44 


Software interrupt summary (SISR) register 
at processor initialization, 3-20 
State flags, per-CPU slot field for, 2-17 


Sunervisor stack nointer (SSP) register 
in initial HWPCB, 3-21 
SWPPAL (PALcode) instruction 
with PALcode switching, 3-6 


System crash, requesting, 3-31 


System cycle counter (SCC) register 
at processor initialization, 3-20 
System initialization, 3-3 
System restarts, 3-27 
error halt and recovery, 3-30 
forcing console I/O mode, 3-36 
powerfail and recovery (multiprocessor), 3-29 
powerfail and recovery (split), 3-29 
powerfail and recovery (uniprocessor), 3-28 
powerfail and recovery (united), 3-29 
primary switching, 3-31 
requesting acrash, 3-31 
RESTORE_TERM routine, 3-34, 3-35 
restoring terminal state, 3-33 
SAVE_TERM routine, 3-34, 3-35 
Saving terminal state, 3-33 
System serial number, HWRPB field for, 2-7 
System value (sysvalue) register 
with PALcode switching, 3-8 
System variation field (HWRPB) 
bit summary, 2-13 . 
System, HWRPB field for 
revision code, 2-7, 2-11 
serial number, 2-11 
type, 2-7, 2-13 
variation, 2-7, 2-13 
T 
Tape. See Magtape 
TB hint offset, HWRPB field for, 2-8 


TBB. See Translation buffer hint block 
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Terminal console, setting controls, 2-43 
Terminals, setting interrupts for, 2—44 
TESTED_PAGES, memory cluster field, 3-13 
Translation buffer hint block (TBB), 2-10, 2-14 
TTY_DEV environment variable, 2-29 

with CTB, 2-73 
TX BUFFER, field in RXTX buffer area, 2-77 
TXLEN, field in RXTX buffer area, 2—77 
TXRDY bitmask, HWRPB field for, 2-10 


TXRDY flag, 2-75 
at multiprocessor boot, 3-23 


U 


Unique. See Processor unique value 





User stack pointer (USP) register 
in initiai HWPCB, 3-21 


V 


Validation, HWRPB field for, 2-6 
Virtual memory regions, initial, 3-16 


Virtual page table base (VPTB) 


HWRPB field for, 2-8 
with PALcode switching, 3-7 


VPTB. See Virtual page table base 


W 


Warm bootstrapping, 3—22 
Who-Am-I (WHAMI) register 


at processor initialization, 3-20 
with PALcode switching, 3-8 


WRITE device routine, 2—55 
characteristics determined by OPEN, 2-56 
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Appendixes 
The following appendixes are included in the Alpha System Reference Manual: 


e Appendix A, Software Considerations 

e =©Appendix B, IEEE Floating-Point Conformance 

e ~=6Appendix C, Instruction Summary 

e Appendix D, Registered System and Processor Identifiers 


e Appendix E, Waivers and Implementation-Dependent Functionality 
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Appendix A 


Software Considerations 


A.1 Hardware-Software Compact 


The Alpha architecture, like all RISC architectures, depends on careful attention to data align- 
ment and instruction scheduling to achieve high performance. 


Since there will be various implementations of the Alpha architecture, it is not obvious how 
compilers can generate high-performance code for all implementations. This chapter gives 
some scheduling guidelines that, if followed by all compilers and respected by all implementa- 
tions, will result in good performance. As such, this section represents a good-faith compact 
between hardware designers and software writers. It represents a set of common goals, not a 
set of architectural requirements. Thus, an Appendix, not a Chapter. 


Many of the performance optimizations discussed below provide an advantage only for fre- 
quently executed code. For rarely executed code, they may produce a bigger program that is 
not any faster. Some of the branching optimizations also depend on good prediction of which 
path from a conditional branch is more frequently executed. These optimizations are best deter- 
mined by using an execution profile, either an estimate generated by compiler heuristics, or a 
real profile of a previous run, such as that gathered by PC-sampling in PCA. — 


Each computer architecture has a "natural word size." For the PDP-11, it is 16 bits; for VAX, 
32 bits; and for Alpha , 64 bits. Other ‘architectures also have a natural word size that varies 
between 16 and 64 bits. Except for very low-end implementations, ALU data paths, cache 
access paths, chip pin buses, and main memory data paths are all usually the natural word size. 


As an architecture becomes commercially successful, high-end implementations inevitably 
move to double-width data paths that can transfer an aligned (at an even natural word address) 
pair of natural words in one cycle. For Alpha , this means 128-bit wide data paths will eventu- 
ally be implemented. It is difficult to get much speed advantage from paired transfers unless 
the code being executed has instructions and data appropriately aligned on aligned octaword 
boundaries. Since this is difficult to retrofit to old code, the following sections sometimes 
encourage "over-aligning" to octaword boundaries in anticipation of high-speed Alpha 
implementations. . 
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In some cases, there are performance advantages to aligning instructions or data to 
cache-block boundaries, or putting data whose use is correlated into the same cache block, or 
trying to avoid cache conflicts by not having data whose use is correlated placed at addresses 
that are equal modulo the cache size. Since the Alpha architecture will have many implementa- 
tions, an exact cache design cannot be outlined here. \Nonetheless, some expected bounds can 
be stated. 


¢ Small (first-level) cache sizes will likely be in the range 2 KB to 64 KB 

¢ Small cache block sizes will likely be 16, 32, 64, or 128 bytes 

e Large (second- or third-level) cache sizes will likely be in the range 128 KB to 8 MB 
e Large cache block sizes will likely be 32, 64, 128, or 256 bytes 

e TB sizes will likely be in the range 16 to 1024 entries 


Thus, if two data items need to go in different cache blocks, it is desirable to make them at 
least 128 bytes apart (modulo 2 KB). Doing so creates a high probability of allowing both 
items to be in a small cache simultaneously for all Alpha implementations.\ 


In each case below, the performance implication is given by an order-of-magnitude number: 1, 
3, 10, 30, or 100. A factor of 10 means that the performance difference being discussed will 
likely range from 3 to 30 across all Alpha implementations. 


A.2 Instruction-Stream Considerations 


The following sections describe considerations for the instruction stream. \The material in this 
section reflects the initial implementations and has not been extensively updated for EV5 and 
EV6 and their derivatives.\ 


A.2.1 Instruction Alignment 


Code PSECTs should be octaword aligned. Targets of frequently taken branches should be at 
least quadword aligned, and octaword aligned for very frequent loops. Compilers could use © 
execution profiles to identify frequently taken branches. 


\Most Alpha implementations will fetch aligned quadwords of instruction stream (two instruc- 
tions), and many will waste an instruction-issue cycle on a branch to an odd longword. 
High-end implementations may eventually fetch aligned octawords, and waste up to three 
issue cycles on a branch to an odd longword. Some implementations may only be able to fetch 
wide chunks of instructions every other CPU cycle. Fetching four instructions from an aligned 
octaword can get at most one cache miss, while fetching them from an odd longword address 
can get two or even three cache misses.\ 


Quadword I-fetch implementors should give first priority to executing aligned quadwords 
quickly. Octaword-fetch implementors should give first priority to executing aligned octa- 
words quickly, and second priority to executing aligned quadwords quickly. Dual-issue 


implementations should give first priority to issuing both halves of an aligned quadword in 
one cycle, and second priority to buffering and issuing other combinations. 
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\Compilers should consider the following points when choosing a near-term and long-range 
strategy for branch target alignment: 


¢ EV4 will be the core in all Alpha products for the near term. 


e EV4 issue-stalls on UNOP if Rx is busy (that is, an operation or load that has been 
issued and has not yet delivered a new value to Rx). 


e FNOP cannot be used when the compiler is producing "INTEGER_ONLY" code. 
Therefore, on EV4, UNOP may be a better choice than NOP in many cases. 


e EV5 and other future Alpha implementations will generally perform better when 
UNOP is used to align branch targets. (The exception to this is any case where NOP or 
FNOP can improve performance in a specific implementation by preventing "splitting." 
Splitting occurs when at least one of a set of instructions sent to the issue stage is oper- 
and issue-stalled or destination issue-stalled. Splitting prevents the issue stage of the 
pipeline from emptying and the next set of instructions from being sent to the issue 
stage. It is an implementation-specific effect.) 


© OpenVMS Alpha, DIGITAL UNIX, and Windows NT Alpha use R30 as the stack 
pointer. Utilities that symbolize instructions may chose to recognize only LDQ_U 
R31,0(R30) for UNOP, and compilers generate this as the preferred form.\ 


A.2.2 \Multiple Instruction Issue — Factor of 3 


Some Alpha implementations will issue multiple instructions in a single cycle. To improve the 
odds of multiple-issue, compilers should choose pairs of instructions to put in aligned quad- 
words. Pick one from column A and one from column B (but only a total of one 
load/store/branch per pair). 


Column A Column B 
Integer Operate | Floating Operate 
Floating Load/Store Integer Load/Store 
Floating Branch Integer Branch 
BR/BSR/JSR 


Implementors of multiple-issue machines should give first priority to dual-issuing at least the 
above pairs, and second priority to multiple-issue of other combinations. 


In general, the above rules will give a good hardware-software match, but compilers may want 
to implement model-specific switches to generate code tuned more exactly to a specific 
implementation.\ 


A.2.3. Branch Prediction and Minimizing Branch-Taken — Factor of 3 


In many Alpha implementations, an unexpected change in I-stream address will result in about 
10 lost instruction times. "Unexpected" may mean any branch-taken or may mean a mispre- 
dicted branch. In many implementations, even a correctly predicted branch to a quadword 
target address will be slower than straight-line code. — 
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Compilers should follow these rules to minimize unexpected branches: 


1. Branch prediction is implementation specific. Based on execution profiles, compilers 
should physically rearrange code so that it has matching behavior. 


2. Make basic blocks as big as possible. A good goal is 20 instructions on average 
between branch-taken. This requires unrolling loops so that they contain at least 20 
instructions, and putting subroutines of less than 20 instructions directly in line. It also 
requires using execution profiles to rearrange code so that the frequent case of a condi- 
tional branch falls through. For very high-performance loops, it will be profitable to 
move instructions across conditional branches to fill otherwise wasted instruction issue 
slots, even if the instructions moved will not always do useful work. Note that using the 
Conditional Move instructions can sometimes avoid breaking up basic blocks. 


3. In an if-then-else construct whose execution profile is skewed even slightly away from 
50%-50% (51-49 is enough), put the infrequent case completely out of line, so that the 
frequent case encounters zero branch-takens, and the infrequent case encounters two 
branch-takens. If the infrequent case is rare (5%), put it far enough away that it never 
comes into the I-cache. If the infrequent case is extremely rare (error message code), 
put it Oi a page Of rarely Cxecuied Code aiid Expeci that page never iv ve paged In. 

4. There are two functionally identical branch-format opcodes, BSR and BR, as shown in 
Figure A-1. 


Figure A-1: Branch-Format BSR and BR Opcodes 


31 26 25 2120 0 


Displacement Branch Format 


Displacement Branch Format 





Compilers should use the first one for subroutine calls, and the second for GOTOs. 
Some implementations may push a stack of predicted return addresses for BSR and 
not push the stack for BR. Failure to compile the correct opcode will result in 
mispredicted return addresses, and hence make subroutine returns slow. 


5. The memory-format JSR instruction, shown in Figure A—2, has 16 unused bits. These 
should be used by the compilers to communicate a hint about expected branch-target 
behavior (see Common Architecture, Chapter 4). 


Figure A-2: Memory-Format JSR Instruction 


31 1615 0 





If the JSR is used for a computed GOTO or a CASE statement, compile bits <15:14> 
as QO, and bits <13:0> such that (updated PC+Instr<13:0>*4) <15:0> equals 
(likely_target_addr) <15:0>. In other words, pick the low 14 bits so that a normal 
PC+displacement*4 calculation will match the low 16 bits of the most likely target 
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longword address. (Implementations will likely prefetch from the matching cache 
block.) 


If the JSR is used for a computed subroutine call, compile bits <15:14> as 01, and 
bits <13:0> as above. Some implementations will prefetch the call target using the 
prediction and also push updated PC on a return-prediction stack. 


If the JSR is used as a subroutine return, compile bits <15:14> as 10. Some 
implementations will pop an address off a return-prediction stack. 


If the JSR is used as a coroutine linkage, compile bits <15:14> as 11. Some 
implementations will pop an address off a return-prediction stack and also push 
updated PC on the return-prediction stack. 


Implementors should give first priority to executing straight-line code with no branch-takens 
as quickly as possible, second priority to predicting conditional branches based on the sign of 
the displacement field (backward taken, forward not-taken), and third priority to predicting 
subroutine return addresses by running a small prediction stack. (VAX traces show a stack of 
two to four entries correctly predicts most branches.) 


A.2.4 Improving I-Stream Density — Factor of 3 


Compilers should try to use profiles to make sure almost 100% of the bytes brought into an 
I-cache are actually executed. This requires aligning branch targets and putting rarely executed 
code out of line. 


A.2.5 Instruction Scheduling — Factor of 3 


The performance of Alpha programs is sensitive to how carefully the code is scheduled to min- 
imize instruction-issue delays. 


"Result latency" is defined as the number of CPU cycles that must elapse between an instruc- 
tion that writes a result register and one that uses that register, if execution-time stalls are to be 
avoided. Thus, with a latency of zero, the instruction writes a result register and the instruction 
that uses that register can be multiple-issued in the same cycle. With a latency of 2, if the writ- 
ing instruction is issued at cycle N, the reading instruction can issue no earlier than cycle N+2. 
Latency is implementation specific. 


Most Alpha instructions have a non-zero result latency. Compilers should schedule code so 
that a result is not used too soon, at least in frequently executed code (inner loops, as identified 
by execution profiles). In general, this will require unrolling loops and inlining short 
procedures. "48 


\Assume that implementations can dual-issue instructions. Assume that Load and JSR instruc- 
tions have a latency of 3, shifts and byte manipulation a latency of 2, integer multiply a 
latency of 10, and other integer operates a latency of 1. Assume floating multiply has a latency 
of 5, floating divide a latency of 10, and other floating operates a latency of 4. Scheduling to 
these latencies gives at least reasonable performance on current implementations. More pre- 
cise tables will be supplied in later versions of this Appendix as the information becomes 
available.\ 
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A.3 


Compilers should try to schedule code to match the above latency rules and also to match the 
multiple-issue rules. If doing both is impractical for a particular sequence of code, the latency 
rules are more important (since they apply even in single-issue implementations). 


Implementors should give first priority to minimizing the latency of back-to-back integer oper- 
ations, of address calculations immediately followed by load/store, of load immediately 
followed by branch, and of compare immediately followed by branch. Give second priority to 
minimizing latencies in general. 


Data-Stream Considerations 


The following sections describe considerations for the data stream. \The material in this sec- 
tion reflects the initial implementations and has not been extensively updated for EV5 and 
EV6 and their derivatives.\ 


A.3.1 Data Alignment — Factor of 10 


Data PSECTs should be at least octaword aligned, so that aggregates (arrays, some records, 
subroutine stack frames) can be allocated on aligned octaword boundaries to take advantage of 
any implementations with aligned octaword data paths, and to decrease the number of cache 
fills in almost all implementations. 


Aggregates (arrays, records, common blocks, and so forth) should be allocated on at least 
aligned octaword boundaries whenever language rules allow. In some implementations, a 
series of writes that completely fill a cache block may be a factor of 10 faster than a series of 
writes that partially fill a cache block, when that cache block would give a read miss. This is 
true of write-back caches that read a partially filled cache block from memory, but optimize 
away the read for completely filled blocks. | 


For such implementations, long strings of sequential writes will be faster if they start on a 
cache-block boundary (a multiple of 128 bytes will do well for most, if not all, Alpha imple- 
mentations). This applies to array results that sweep through large portions of memory, and to . 
register-save areas for context switching, graphics frame buffer accesses, and other places 
where exactly 8, 16, 32, or more quadwords are stored sequentially. Allocating the targets at 


multiples of 8, 16, 32, or more quadwords, respectively, and doing the writes in order of 
increasing address will maximize the write speed. 


Items within aggregates that are forced to be unaligned (records, common blocks) should gen- 
erate compile-time warning messages and inline byte extract/insert code. Users must be 
educated that the warning message means that they are taking a factor of 30 performance hit. 


\Compilers should consider supplying a switch that allows the compiler to pad aggregates to 
avoid unaligned data.\ 


Compiled code for parameters shouid assume that the parameters are aligned. Unaligned actu- 
als will cause run-time alignment traps and very slow fixups. The fixup routine, if invoked, 
should generate warning messages to the user, preferably giving the first few statement num- 
bers that are doing unaligned parameter access, and at the end of a run the total number of 
alignment traps (and perhaps an estimate of the performance improvement if the data were 
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aligned). Users must be educated that the trap routine warning message means they are taking 
a factor of 30 performance hit. 


Frequently used scalars should reside in registers. Each scalar datum allocated in memory 
should normally be allocated an aligned quadword to itself, even if the datum is only a byte 
wide. This allows aligned quadword loads and stores and avoids partial-quadword writes 
(which may be half as fast as full-quadword writes, due to such factors as read-modify-write a 
quadword to do quadword ECC calculation). 


Implementors should give first priority to fast reads of aligned octawords and second priority 
to fast writes of full cache blocks. 


A.3.2 Shared Data in Multiple Processors — Factor of 3 


Software locks are aligned quadwords and should be allocated to large cache blocks that either 
contain no other data or read-mostly data whose usage is correlated with the lock. 


Whenever there is high contention for a lock, one processor will have the lock and be using 
the guarded data, while other processors will be in a read-only spin loop on the lock bit. Under 
these circumstances, any write to the cache block containing the lock will likely cause excess 
bus traffic and cache fills, thus affecting performance on all processors that are involved and 
the buses between them. In some decomposed FORTRAN programs, refills of the cache 
blocks containing one or two frequently used locks can account for a third of all the bus band- 
width the program consumes. 


Whenever there is almost no contention for a lock, one processor will have the lock and be 
using the guarded data. Under these circumstances, it might be desirable to keep the guarded 
data in the same cache block as the lock. 


For the high-sharing case, compilers should assume that almost all accesses to shared data 
result in cache misses all the way back to main memory, for each distinct cache block used. 
Such accesses will likely be a factor of 30 slower than cache hits. It is helpful to pack corre- 
lated shared data into a small number of cache blocks. It is helpful also to segregate blocks 
written by one processor from blocks read by others. 


Therefore, accesses to shared data, including locks, should be minimized. For example, a 
four-processor decomposition of some manipulation of a 1000-row array should avoid access- 
ing lock variables every row, but instead might access a lock variable every 250 rows. 


Array manipulation should be partitioned across processors so that cache blocks do not thrash 
between processors. Having each of four processors work on every fourth array element 
severely impairs performance on any implementation with a cache block of four elements or 
larger. The processors all contend for copies of the same cache blocks and use only one quar- 
ter of the data in each block. Writes in one processor severely impair cache performance on all 
processors. | 


A better decomposition is to give each processor the largest possible contiguous chunk of data 
to work on (N/4 consecutive rows for four processors and row-major array storage; N/4 col- 
umns for column-major storage). With the possible exception of three cache blocks at the 
partition boundaries, this decomposition will result in each processor caching data that is 
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touched by no other processor. 


Operating-system scheduling algorithms should attempt to minimize process migration from 
one processor to another. Any time migration occurs, there are likely to be a large number of 
cache misses on the new processor. 


Similarly, operating-system scheduling algorithms should attempt to enforce some affinity 
between a given device’s interrupts and the processor on which the interrupt-handler runs. I/O 
control data structures and locks for different devices should be disjoint. Observing these 
guidelines allows higher cache hit rates on the corresponding I/O control data structures. 


Implementors should give first priority to an efficient (low-bandwidth) way of transferring iso- 
lated lock values and other isolated, shared write data between processors. 


Implementors should assume that the amount of shared data will continue to increase, so over 
time the need for efficient sharing implementations will also increase. 


A.3.3 Avoiding Cache/TB Conflicts — Factor of 1 


Occasionally, programs that run with a direct-mapped cache or TB will thrash, taking exces- 
sive cache or TB misses. With some work, thrashing can be minimized at compile time. 


Note: 


No Alpha processor through and including the 21264 has implemented a direct-mapped 
TB. 


In a frequently executed loop, compilers could allocate the data items accessed from memory 
so that, on each loop iteration, all of the memory addresses accessed are either in exactly the 
same aligned 64-byte block or differ in bits VA<10:6>. For loops that go through arrays in a 
common direction with a common stride, this requires allocating the arrays, checking that the 
first-iteration addresses differ, and if not, inserting up to 64 bytes of padding between the 
arrays. This rule will avoid thrashing in small direct-mapped data caches with block sizes up 
to 64 bytes and total sizes of 2K bytes or more. 


Example: 
REAL*4 A(1000),B(1000) 
DO 60 i=1,1000 
60 A( i ) = £(B( i )) 


Figures A~3, A-4, and A-5 show bad, better, and best allocation in cache, respectively. 


BAD allocation (A and B thrash in 8 KB direct-mapped cache): 
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Figure A-3: Bad Allocation in Cache 





0 4K 8K 12K 16K 


BETTER allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be 
in cache simultaneously): 


Figure A-4: Better Allocation in Cache 





0 4K 8K+64 12K 16K 


BEST allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B can be in 
cache simultaneously, and both arrays fit entirely in 8 KB or bigger cache): 





Figure A-5: Best Allocation in Cache 
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In a frequently executed loop, compilers could allocate the data items accessed from memory 
so that, on each loop iteration, all of the memory addresses accessed are either in exactly the 
same § KB page, or differ in bits VA<17:13>. For loops that go through arrays in a common 
direction with a common stride, this requires allocating the arrays, checking that the first-itera- 
tion addresses differ, and if they do not, inserting up to 8K bytes of padding between the 
arrays. This rule will avoid thrashing in direct-mapped TBs and in some large direct-mapped 
data caches with total sizes of 32 pages (256 KB) or more. 


Usually, this padding will mean zero extra bytes in the executable image, just a skip in virtual 
address space to the next-higher page boundary. 


For large caches, the rule above should be applied to the I-stream, in addition to all the 
D-stream references. Some implementations will have combined I-stream/D-stream large 
caches. 


Both of the rules above can be satisfied simultaneously, thus often eliminating thrashing in all 
anticipated direct-mapped cache/TB implementations. 


A.3.4 Sequential Read/Write — Factor of 1 


All other things being equal, sequences of consecutive reads or writes should use ascending 
(rather than descending) memory addresses. Where possible, the memory address for a block 
of 2**Kbytes should be on a 2**K boundary, since this minimizes the number of different 
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cache blocks used and minimizes the number of partially written cache blocks. 


To avoid overrunning memory bandwidth, sequences of more than eight quadword load or 
store instructions should be broken up with intervening instructions (if there is any useful 
work to be done). 


For consecutive reads, implementors should give first priority to prefetching ascending cache 
blocks and second priority to absorbing up to eight consecutive quadword load instructions 
(aligned on a 64-byte boundary) without stalling. 


For consecutive writes, implementors should give first priority to avoiding read overhead for 
fully written aligned cache blocks and second priority to absorbing up to eight consecutive 
quadword store instructions (aligned on a 64-byte boundary) without stalling. 


A.3.5 Prefetching — Factor of 3 


Prefetching can be directed toward a cache block (a cache line) in the primary cache. 


Alpha hardware, beginning with the 21164 (EV5) and subsequent, supports cache block 
prefetching. Cache block prefetching is performed by the following load operations to the R31 
or F31 register: 


Table A-1: Cache Block Prefetching 


Type Instructions Operation 


Normal Prefetch LDL R31,xxx (Rn) _ If the load operation hits in the Dcache, the 
instruction is dismissed; otherwise, the 
addressed cache block is allocated into the 


Deache. 
Prefetch with LDS F31, xxx(Rn) [If the load operation hits a dirty, modified, 
Modify Intent Deache block, the instruction is dismissed. Oth- 


erwise, the addressed cache block is allocated 
into the Dcache for write access — its dirty and 
modified bits are set. 


Prefetch, Evict LDQ R31,xxx (Rn)  Prefetch a cache block and mark that block in an 
Next associated cache to be evicted on the next cache 
| fill to an associated address. (This operation is 
useful to prefetch data that is not to be repeat- 

edly referenced.) 


A.4 Code Sequences 


The following section describes code sequences. 
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A.4.1 Aligned Byte/Word (Within Register) Memory Accesses 


The instruction sequences given in Common Architecture, Chapter 4, for byte-within-register 
accesses are worst-case code. More importantly, they do not reflect the instructions available 
with the BWX extension, described in the Common Architecture, Chapter 4, Byte Manipula- 
tion Instructions, and in Appendix D. If the BWX extension instructions are available, it is 

wise to consider them rather than the sequences that follow. = 


The following sequences are appropriate if the BWX extension instructions are not available. 


In the common case of accessing a byte or aligned word field at a known offset from a pointer 
that is expected to be at least longword aligned, the common-case code is much shorter. 
"Expected" means that the code should run fast for a longword-aligned pointer and trap for 
unaligned. The trap handler may at its option fix up the unaligned reference. 


For access at a known offset D from a longword-aligned pointer Rx, let D.lw be D rounded 
down to a multiple of 4 ((D div 4)*4), and let D.mod be D mod 4. 
In the common case, the intended sequence for loading and zero-extending an aligned word is: 


LDL. R1,D.1lw(Rx) ! Traps if unaligned 
EXTWL R1,#D.mod,R1 ! Picks up word at byte 0 or byte 2 


In the common case, the intended sequence for loading and sign-extending an aligned word is: 


LDL R1,D.1w(Rx) ! Traps if unaligned 
SLL R1, #48-8*D.mod,R1 ! Aligns word at high end of R1 
SRA R1,#48,R1 ! SEXT to low end of R1 

Note: 


The shifts often can be combined with shifts that might surround subsequent arithmetic 
operations (for example, to produce word overflow from the high end of a register). 


In the common case, the intended sequence for loading and zero-extending a byte is: 


LDL R1,D.1w(Rx) ! 
EXTBL R1,#D.mod,R1 ! 


In the common case, the intended sequence for loading and sign-extending a byte is: 


LDL R1,D.1w(Rx) ! 


SLL R1, #56~8*D.mod,R1 ! 
SRA R1,#56,R1 ! 


In the common case, the intended sequence for storing an aligned word RS is: 
LDL R1,D.1w(Rx) ! 
INSWL R5,#D.mod,R3 


! 
MSKWL = R1,#D.mod,R1 ! 
BIS R3,R1,R1 ! 
STL R1,D.1lw(Rx) ! 
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In the common case, the intended sequence for storing a byte R5 is: 


LDL R1,D.1w(Rx) ! 
INSBL R5,#D.mod,R3 ! 
MSKBL R1,#D.mod,R1 ! 
BIS R3,R1,R1 ! 
STL R1,D.1lw(Rx) ! 


A.4.2 Division 


In all implementations, floating-point division is likely to have a substantially longer result 
latency than floating-point multiply. In addition, in many implementations multiplies will be 
pipelined and divides will not. 


Thus, any division by a constant power of two should be compiled as a multiply by the exact 
reciprocal, if it is representable without overflow or underflow. If language rules or surround- 
ing context allow, multiplication by the reciprocal can closely approximate other divisions by 
constants. 


Integer division does not exist as a hardware opcode. Division by a constant can always be 
done via UMULH of another appropriate constant, followed by a right shift. A subroutine can 
do general quadword division by true variables. The subroutine could test for small divisors 
(less than about 1000 in absolute value) and for those, do a table lookup on the exact constant 
and shift count for an UMULH/shift sequence. For the remaining cases, a table lookup on 
about a 1000-entry table and a multiply can give a linear approximation to 1/divisor that is 
accurate to 16 bits. 


Using this approximation, a multiply and a back-multiply and a subtract can generate one 
16-bit quotient digit plus a 48-bit new partial dividend. Three more such steps can generate the 
full quotient. Having prior knowledge of the possible sizes of the divisor and dividend, normal- 
izing away leading bytes of zeros, and performing an early-out test can reduce the average 
number of multiplies to about five (compared to a best case of one and a worst case of nine). 


A.4.3 Byte Swap 


When it is necessary to swap all the bytes of a datum, perhaps because the datum originated 
on a machine of the opposite byte numbering convention, the simplest sequence is to use the 
VAX floating-point load instruction to swap words, followed by an integer sequence to swap 
four pairs of bytes. Assume as shown below that an aligned quadword datum is in memory at 
location X and is to be left in R1 after byte-swapping; temp is an aligned quadword temporary, 
and "." (period) in the comments stands for a byte of zeros. Similar sequences can be used for 
data in registers, sometimes doing the byte swaps first and word swap second: . 


X = ABCD EFGH 


=e 


LDG FO,X ; FO = GHEF CDAB 
STT FO,temp 

LDQ = R1, temp ; Rl = GHEF CDAB 
SLL R1,#8,R2 ; R2 = HEFC DAB. 
SRL R1,#8,R1 ; Rl = .GHE FCDA 
ZAP R2,#55(hex) ,R2 > R2 = H.F. D.B. 
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ZAP R1,#AA(hex),R1 >; Rl = .G.E .C.A 
OR R1,R2,R1 ; Rl = HGFE DCBA 


For bulk swapping of arrays, this sequence can be usefully unrolled about four times and 
scheduled, using four different aligned quadword memory temps. 


A.4.4 Stylized Code Forms 


Using the same stylized code form for a common operation improves the readability of com- 
piler output and increases the likelyhood that an implementation will speed up the stylized 
form. 


A.4.4.1 NOP 


The universal NOP form is: - 
UNOP | == LDQ U-_R31,0(Rx) 


In most implementations, UNOP should encounter no operand issue delays, no destination 
issue delay, and no functional unit issue delays. (In some implementations, it may encounter 
an operand issue delay for Rx.) Implementations are free to optimize UNOP into no action and 
zero execution cycles. 


If the actual instruction is encoded as LDQ_U Rn,0(Rx), where n is other than 31, and such an 
instruction generates a memory-management exception, it is UNPREDICTABLE whether 
UNOP would generate the same exception. On most implementations, UNOP does not gener- 
ate memory management exceptions. 


The standard NOP forms are: 


NOP == BIS R31,R31,R31 
FNOP = CPYS F31,F31,F31 


These generate no exceptions. In most implementations, they should encounter no operand 
issue delays and no destination issue delay. Implementations are free to optimize these into no 
action and zero execution cycles. 


A.4.4.2 Clear a Register 


The standard clear register forms are: 


CLR == BIS R31,R31,Rx 
FCLR == CPYS F31,F31,Fx 


These generate no exceptions. In most implementations, they should encounter no operand 
issue delays and no functional unit issue delay. 
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A.4.4.3 Load Literal 
The standard load integer literal (ZEXT 8-bit) form is: 


MOV #1it8,Ry == BIS R31, 1it8, Ry 


The Alpha literal construct in Operate instructions creates a canonical longword constant for 
values 0..255. 


A longword constant stored in an Alpha 64-bit register is in canonical form when bits 
<63:32>=bit <31>. 


A canonical 32-bit literal can usually be generated with one or two instructions, but sometimes 
three instructions are needed. Use the following procedure to determine the offset fields of the 


instructions: 
val = <sign-extended, 32-bit value> 
low = val <15:0> 
tmp1l = val - SEXT(low) ! Account for LDA instruction 
high = tmpl <31:16> 


tmp2 = tmpl - SHIFT LEFT( SEXT(high,16) ) 


if tmp2 NE O then 
! original val was in range 7FFF8000j¢..7FFFFFFF}¢ 
extra = 4000), 
tmp1 = tmpl - 40000000;, 
high = tmpl <31:16> 
else 
extra = 0 
endif 


The general sequence is: 


LDA Rdst, low(R31) 
LDAH Rdst, extra(Rdst) ! Omit if extra=0 
LDAH Rdst, high(Rdst) ! Omit if high=0 


A.4.4.4 Register-to-Register Move 


The standard register move forms are: 


MOV RX,RY == BIS RX,RX,RY 
FMOV FX,FY == CPYS FX,FX,FY 


These move forms generate no exceptions. In most implementations, these should encounter 
no functional unit issue delay. 
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A.4.4.5 Negate 


The standard register negate forms are: 


NEGZ Rx,Ry =e SUBz R31,Rx,Ry !z=LorQ 
NEGZ Fx,Fy == SUBZ F31,Fx,Fy !zZg=FGSorTitT 
FNEGZ Fx,Fy == CPYSN Fx,Fx,Fy !z=FGSortT 


The integer subtract generates no Integer Overflow trap if Rx contains the largest negative 
number (SUBZz/V would trap). The floating subtract generates a floating-point exception for a 
non-finite value in Fx. The CPYSN form generates no exceptions. 


A.4.4.6 NOT 
| The standard integer register NOT form is: 
NOT Rx,Ry == ORNOT  R31,Rx,Ry 


This generates no exceptions. In most implementations, this should encounter no functional 
unit issue delay. 


A.4.4.7 Booleans 


The standard alternative to BIS is: 
OR Rx,Ry,RzZ == BIS Rx,Ry,RZ 
The standard alternative to BIC is: 


ANDNOT Rx,Ry,RZ == BIC Rx,Ry,RZ 
The standard alternative to EQV is: 


XORNOT Rx,Ry,RZ == EQV Rx,Ry,RZ 
A.4.5 Exception and Trap Barriers 


The EXCB instruction allows software to guarantee that in a pipelined implementation, all pre- 
vious instructions have completed any behavior related to exceptions or rounding modes 
before any instructions after the EXCB are issued. In particular, all changes to the Float- © 
ing-point Control Register (FPCR) are guaranteed to have been made, whether or not there is 
an associated exception. Also, all potential floating-point exceptions and integer overflow 
exceptions are guaranteed to have been taken. 


The TRAPB instruction guarantees that it and any following instructions do not issue until all 
possible preceding traps have been signaled. This does not mean that all preceding instructions 
have necessarily run to completion (for example, a Load instruction may have passed all the 
fault checks but not yet delivered data from a cache miss). 


EXCB is thus a superset of TRAPB. 
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A.4.6 Pseudo-Operations (Stylized Code Forms) 


This section summarizes the pseudo-operations for the Alpha architecture that may be used by 
various software components in an Alpha system. Most of these forms are discussed in preced- 
ing sections. 


In the context of this section, pseudo-operations all represent a single underlying machine 
instruction. Each pseudo-operation represents a particular instruction with either replicated 
fields (such as FMOV), or hard-coded zero fields. Since the pattern is distinct, these 
pseudo-operations can be decoded by instruction decode mechanisms. 


In Table A—2, the pseudo-operation codes can be viewed as macros with parameters. The for- 
mal form is listed in the left column, and the expansion in the code stream is listed in the right 
column. 


Some instruction mnemonics have synonyms. These differ from pseudo-operations in that 
each synonym represents the same underlying instruction with no special encoding of operand 
fields. As a result, synonyms cannot be distinquished from each other. They are not listed in 
the table. Examples of synonyms are: BIC/ANDNOT, BIS/OR, and EQV/XORNOT. 


Table A—2: Decodable Pseudo-Operations (Stylized Code Forms) 


Pseudo-Operation Actual Instruction 


in Listing Meaning Encoding 

BR target Branch to target (21-bit signed BR R31, target 
displacement) 

CLR Rx Clear integer register BIS R31, R31, Rx 

FABS Fx, Fy No-exception generic floating CPYS F31, Fx, Fy 
absolute value 

FCLR Fx Clear a floating-point register CPYS F31, F31, Fx 

FMOV Fx, Fy Floating-point move CPYS Fx, Fx, Fy 

FNEG Fx, Fy No-exception generic floating CPYSN Fx, Fx, Fy 
negation 

FNOP Floating-point no-op CPYS F31, F31, F31 

MOV Lit, Rx Move 16-bit sign-extended LDA Rx,lit(R31) 
literal to Rx 

MOV {Rx/Lit8}, Ry Move Rx/8-bit zero-extended BIS R31,{Rx/Lit8},Ry 
literal to Ry 

MF_FPCR Fx Move from FPCR MF_FPCR _ Fx, Fx, Fx 

MT_FPCR Fx Move to FPCR MT_FPCR _ Fx, Fx, Fx 

NEGF Fx, Fy Negate F_fioating SUBF F31, Fx, Fy 

NEGF/S Fx, Fy Negate F_floating, semi-precise SUBF/S F31, Fx, Fy 

NEGG Fx, Fy Negate G_floating SUBG F31, Fx, Fy 
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Table A—2: Decodable Pseudo-Operations (Stylized Code Forms) (Continued) 


Pseudo-Operation 


in Listing 


NEGG/S 


NEGL 
NEGL/V 


NEGQ 
NEGQ/V 


NEGS 
NEGS/SU 


NEGS/SUI 
NEGT 
NEGT/SU 
NEGT/SUI 
NOP 

NOT 
SEXTL 


UNOP 


Fx, Fy 


{Rx/Lit8}, Ry 
{Rx/Lit8}, Ry 


{Rx/Lit8}, Ry 
{Rx/Lit8}, Ry 


Fx, Fy 
Fx, Fy 


Fx, Fy 


Fx, Fy 
Fx, Fy 


{Rx/Lit8}, Ry 


{Rx/Lit8}, Ry 


Meaning 


Negate G_floating, 
semi-precise 


Negate longword 


Negate longword with 
overflow detection 


Negate quadword 


Negate quadword with 
overflow detection 


Negate S_ floating 


Negate S_floating, software 
with underflow detection 
Negate S_floating, software 
with underflow and inexact 
result detection 

Negate T_floating 

Negate T_floating, software 
with underflow detection 
Negate T_floating, software 
with underflow and inexact 
result detection 

Integer no-op 

Logical NOT of Rx/8-bit 
zero-extended literal storing 


results in Ry 


Longword sign-extension of Rx 
storing results in Ry 


Universal NOP for both integer 
and floating-point code 
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Actual Instruction 


Encoding 

SUBG/S F31, Fx, Fy 
SUBL R31,{Rx/Lit},Ry 
SUBL/V R31, {Rx/Lit}, Ry 
SUBQ R31,{Rx/Lit},Ry 
SUBQ/V R31, {Rx/Lit}, Ry 
SUBS F31, Fx, Fy 
SUBS/SU _ F3], Fx, Fy 
SUBS/SUI _ F31, Fx, Fy 
SUBT F31, Fx, Fy 
SUBT/SU _ F31, Fx, Fy 
SUBT/SUI — F31,Fx, Fy 

BIS R31, R31, R31 
ORNOT R31, {Rx/Lit}, Ry 
ADDL R31, {Rx/Lit}, Ry 
LDQ_U R31,0(Rx) 
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A.5 Timing Considerations: Atomic Sequences 


A sufficiently long instruction sequence between LDx_L and STx_C will never complete, 
because periodic timer interrupts will always occur before the sequence completes. The follow- 
ing rules describe sequences that will eventually complete in all Alpha implementations: 


e At most 40 operate or conditional-branch (not taken) instructions executed in the 
sequence between LDx_L and STx_C. 


e At most two I-stream TB-miss faults. Sequential instruction execution guarantees this. 


e No other exceptions triggered during the last execution of the sequence. 


Implementation Note: 


On all expected implementations, this allows for about 50 Usec of execution time, even 
with 100 percent cache misses. This should satisfy any requirement for a 1-msec timer 
interrupt rate. 
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A.6 \Revision History 


Revision 7.0, November10, 1997 
1. Alpha AXP —> Alpha 
Removed commented-out perfmon stuff that went to another appendix 
Removed FETCH and FETCH_M and added cache line prefetch 
OpenVMS AXP —> OpenVMS Alpha 
DEC OSF/1 —-> DIGITAL UNIX 
Windows NT AXP ——> Windows NT Alpha 
Digital —-> DIGITAL 


oi eee a ae ee: 


Revision 6.0, December 12, 1994 
1. Alpha ——> Alpha AXP 
2. Added ECO 76, Bi-endian support 
3. Added ECO 65, Trap shadow rules modification 
4. Removed ‘no current implementations’ note (A_SRM note 175.4) 
5. Added ECO 48, UNOP pseudo-op. 


Revision 5.0, May 12, 1992 
1. Changed cache block sizes 
Changed DRAINT to TRAPB 
Converted to SDML 
Changed MOVQ to MOV for standard load 16 bit literal 
Changed NEGS and NEGT instruction qualifiers to match SUBS and SUBT qualifiers 


Modified text describing creation of canonical longword constants 


Ok ee 


Revision 4.0, August 21, 1991 
1. Added Pseudo-op table 
2. Typos 


Change text describing JSR to indicate that PC+displacement*4 calculation will pro- 
duce the low 16 bits of most likely LW target address 


4. Change name of NEGz form that operates on F, D, G, S, or T floating types to FNEGz 
Correct Load Literal code form description of sign-extended 32 bit load. 


6. Added floating point data format types to ‘Negate’ section 


Revision 3.0, March 2, 1990 
1. Add section on prefetch instructions 


2. Minor cleanups to match opcodes in rest of document 
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Revision 2.0, October 4, 1989 
1. Renumber RO as R31, FO as F31 


2. Show new byte inserts 
3. Change Freeze-Thaw to LDQ/L-STQ/C 


Revision 1.0, May 23, 1989 
1. Reorder and add hardware implementation priorities 
2. Add aligned byte/word section 
3. Add stylized code form section 
4 


Add timing considerations section 


Revision 0.0, March 15, 1989 


1. Initial version\ 
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Appendix B 


IEEE Floating-Point Conformance 


A subset of IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 
754-1985) is provided in the Alpha floating-point instructions. This appendix describes how to 
construct a complete IEEE implementation. . . 


The order of presentation parallels the order of the IEEE specification. 


B.1 Alpha Choices for IEEE Options 


Alpha supports IEEE single, double, and optionally Gn software) extended double formats. 
There is no hardware support for the optional extended double format. 


Alpha hardware supports normal and chopped IEEE rounding modes. IEEE plus infinity and 
minus infinity rounding modes can be implemented in hardware or software. 


Alpha hardware does not support optional IEEE software trap enable/disable modes. See the 
following discussion about software support. 


Alpha hardware supports add, subtract, multiply, divide, convert between floating formats, 
convert between floating and integer formats, compare, and square root. Software routines sup- 
port remainder, round to integer in floating-point format, and convert binary to/from decimal. 


In the Alpha architecture, copying without change of format is not considered an operation. 
(LDx, CPYSx, and STx do not check for non-finite numbers; an operation would.) Compilers 
may generate ADDx F31,Fx,Fy to get the opposite effect. 


Optional operations for differing formats are not provided. 


The Alpha choice is that the accuracy provided by conversions between decimal strings and 
binary floating-point numbers will meet or exceed IEEE standard requirements. It is imple- 
mentation dependent whether the software binary/decimal conversions beyond 9 or 17 digits 
treat any excess digits as zeros. 
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Overflow and underflow, NaNs, and infinities encountered during software binary to decimal 
conversion return strings that specify the conditions. 


Alpha hardware supports comparisons of same-format numbers. Software supports compari- 
sons of different-format numbers. 


In the Alpha architecture, results are true-false in response to a predicate. 


Alpha hardware supports the required six predicates and the optional unordered predicate. The 
other 19 optional predicates can be constructed from sequences of two comparisons and two 
branches. 


Alpha hardware supports infinity arithmetic with the compare instructions (CMPTyy). When a 
/S qualifier is included, Alpha hardware may optionally support infinity arithmetic when infin- 
ity operands are encountered and, together with overflow disable (OVFD) and division by 
zero disable (DZED), when infinity is to be generated from finite operands. Otherwise, Alpha 
hardware supports infinity arithmetic by trapping. That is the case when an infinity operand is 
encountered and when an infinity is to be created from finite operands by overflow or division 
by zero. An OS completion handler (interposed between the hardware and the IEEE user) pro- 
vides correct infinity arithmetic. 


When a/S qualifier is included, Alpha hardware may optionally support NaNs and invalid 
operations, controlled by the INVD option. Otherwise, Alpha hardware supports NaNs and 
invalid operations by trapping when a NaN operand is encountered and when a NaN is to be 
created. An OS completion handler (interposed between the hardware and the IEEE user) pro- 
vides correct Signaling and Quiet NaN behavior. 


In the Alpha architecture, Quiet NaNs do not afford retrospective diagnostic information. 


In the Alpha architecture, copying a Signaling NaN without a change of format does not signal 
an invalid exception (LDx, CPYSx, and STx do not check for non-finite numbers). Compilers 
may generate ADDx F31,Fx,Fy to get the opposite effect. 


Alpha hardware fully supports negative zero operands and follows the IEEE rules for creating 
negative zero results except for underflow. When a /S qualifier is included, Alpha hardware 
may optionally support underflow and denormalized numbers, controlled by the UNFD 
option. Otherwise, Alpha hardware supports underflow and denormalized numbers by trap- 
ping when a denormalized operand is encountered, when a denormalized result is created, and 
when an underflow occurs. An OS completion handler (interposed between the hardware and 
the IEEE user) provides correct denormalized and underflow arithmetic. 


Except for the optional trap disable bits in the FPCR, Alpha hardware does not supply IEEE 
exception trap behavior; the hardware traps are a superset of the IEEE-required conditions. An 
OS completion handler (interposed between the hardware and the IEEE user) provides correct 
IEEE exception behavior. 


In the Alpha architecture, tininess is detected by hardware after rounding, and loss of accuracy 
is detected by software as an inexact result. 
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In the Alpha architecture, user signal handlers are supported by compilers and an OS comple- 
tion handler (interposed between the hardware and the IEEE user), as described in the next 
section. 


B.2. Alpha Support for OS Completion Handlers 


Alpha floating-point trap behavior is statically controlled by the /S, /U, and /I mode qualifiers 
on floating-point instructions. Changing these options usually requires recompiling. Instruc- 
tions with any valid qualifier combination that includes the /S qualifier can be dynamically 
controlled by the optional trap disable bits and denormal control bits in the FPCR. 


Each Alpha implementation may choose how to distribute support for the completion modes 
(/S,/SU, /SV, /SUI, and /SVI), between hardware and software. An implementation may 
minimize hardware complexity by trapping to implementation software for support of excep- 
tions and non-finites. An implementation may choose increased floating-point performance at 
the cost of increased hardware complexity by providing hardware support for exceptions and 
non-finites. 


However completion mode support is distributed, application software on any system that 
meets the Alpha architecture specification will see consistent floating-point semantics because 
Alpha implementation software provides support for any floating-point feature that is not 
directly supported by the hardware. 


Each Alpha operating system must include an OS completion handler that does software com- 
pletion of instructions that have any valid qualifier combination that includes the /S qualifier, 
and that finishes the computation of any floating-point operation that is not completed by the 
hardware. The OS completion handler is responsible for providing the result specified by the 
architecture. The handler either continues execution of the application program or signals an 
exception to the application. 


If the exception summary parameter of an arithmetic trap indicates that an instruction requir- 
ing software completion caused the trap, the operating system must finish the operation. An 
OS completion handler uses the register write mask parameter to ignore instructions in the trap 
shadow and to locate the trigger instruction of the arithmetic trap. The handler then uses the 
trigger instruction input register values to compute the result in the output register and to 
record any appropriate signal status. The handler then continues execution with the instruction 
following the trigger instruction, unless the application has requested execution of an optional 
signal handler. 


It is recommended that the OS completion handler report an enabled IEEE exception to the 
user application as a fault, rather than as a trap. When reported as a fault, the reported PC 
points to the trigger instruction, rather than after the trigger instruction. Regardless of whether 
an enabled fault occurs, it is recommended that the completion trap handler set the result regis- 
ter and status flags to the IEEE standard nontrapping results, as defined in the IEEE Standard 
section in Common Architecture, Chapter 4. That behavior makes it possible for the user appli- 
cation to continue from a fault by stepping over the trigger instruction. 


The Floating-Point Control Register (FPCR) contains several trap disable bits and denormal 
control bits. Implementation of these bits in the FPCR is optional. A system that includes 
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these bits may choose to complete computations involving non-finite values without the assis- 
tance of software completion. Operating systems use these FPCR bits to enable hardware 
completion of instructions with any valid qualifier combination that includes /S in those cases 
where the operating system does not require a trap to do exception signaling. 


To get the optional full IEEE user trap handler behavior, an OS completion handler must be 
provided that implements the exception status flags, dynamic user trap handler disabling, han- 
dler saving and restoring, default behavior for disabled user trap handlers, and linkages that 
allow a user handler to return a substitute result. OS completion handlers can use the 
FP_Control quadword, along with the floating-point control register (FPCR), to provide vari- 
ous levels of IEEE-compliant behavior. 

OS completion handlers provide two options for special handling of denormal numbers in 
instructions that are compiled with any valid qualifier combination that includes the /S quali- 
fier. These options are controlled by bits defined by implementation software in the IEEE 
Floating-Point Control (FP_C) Quadword. 


e =6The first option mans all denormal results to a true zero value. That option is useful for 
improving the performance of IEEE compliant code that does not need gradual under- 
flow and for mixing IEEE instructions that both include and do not include the /S qual- 
ifier. 

¢ A second option treats all denormal input operands as if they were signed zeros. That 


option is useful for improving the performance of IEEE compliant code that encounters 
spurious denormal values in uninitialized data. 


The optional UNDZ and DNZ (denormal control) bits in the FPCR can assist hardware to 
improve the performance of these denormal handling options. 


B.2.1 [EEE Floating-Point Control (FP_C) Quadword 


Operating system implementations provide the following support for an IEEE floating-point 
control quadword (FP_C), illustrated in Figure B—1 and described in Table B-1. 


Figure B-1: [EEE Floating-Point Control (FP_C) Quadword 


63 23 22 212019181716 7654321 
D) | |UsJO/D} I Di | /UlO;}D} I 
NININ|V{Z/N NININIVIZIN 
Reserved OlEIEIFIEV Reserved OlEIFIFIElV 
S|ISIS/S|S!IS E|EJE/E/EIE 


e¢ The operating system software completion mechanism maintains the FP_C. Therefore, 
the FP_C affects (and is affected by) only those instructions with any valid qualifier 
combination that includes the /S qualifier. 


e The FP_C quadword is context switched when the operating system switches the thread 
context. (The FP_C can be placed in a currently switched data structure.) 


e Although the operating system can keep the FP_C in a user mode memory location, 
user code may not directly access the FP_C. . 
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e Integer overflow (IOV) exceptions are controlled by the INVE enable mask bit 
(FP_C<1>), as allowed by the IEEE standard. Implementation software is responsible 
for setting the INVS status bit (FP_C<17>) when a CVTTQ or CVTQL instruction 
traps into the software completion mechanism for integer overflow . 


e At process creation, all trap enable flags in the FP_C are clear. The settings of other 
FP_C bits, defined in Table B—1 as reserved for implementation software, are defined 
by operating system software. 


At other events such as forks or thread creation, and at asynchronous routine calls such as 
traps and signals, the operating system controls all assigned FP_C bits and those defined as 
reserved for implementation software. 


Table B-1: Floating-Point Control (FP_C) Quadword Bit Summary 


Bit Description 


63-48 Reserved for implementation software. 
47-23 Reserved for future architecture definition. 


27 Denormal operand status (DNOS) 
A floating arithmetic or conversion operation used a denormal operand value. 
This status field is left unchanged if the system is treating denormal operand val- 
ues as if they were signed zero values. If an operation with a denormal operand 
causes other exceptions, all appropriate status bits are set. 


21 Inexact result status (INES) 
A floating arithmetic.or conversion operation gave a result that differed from the 
mathematically exact result. 


20 Underflow status (UNFS) 
A floating arithmetic or conversion operation underflowed the destination expo- 
nent. 


19 Overflow status (OVFS) 
A floating arithmetic or conversion operation overflowed the destination expo- 
nent. 


18 Division by zero status (DZES) 
An attempt was made to perform a floating divide operation with a divisor of zero. 


17 Invalid operation status (INVS) 
An attempt was made to perform a floating arithmetic, conversion, or comparison 
operation, and one or more of the operand values were illegal. 


16-12 Reserved for implementation software. 


11-7 Reserved for future architecture definition. 
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Table B-1: Floating-Point Control (FP_C) Quadword Bit Summary (Continued) 


Bit Description 


6 Denormal operand exception enable (DNOE) 
Initiate an INV exception if a floating arithmetic or conversion operation involves 
a denormal operand value. This exception does not signal if the system is treating 
denormal operand values as if they were signed zero values. If an operation can 
initiate more than one enabled exception, the denormal operand exception has pri- 
ority. 


\The x86 architecture also supports the denormal operand exception, but does not 
give this exception the same priority as Alpha. Alpha software that emulates x86 
instructions may need to supply a denormal exception handler that provides the 
x86 behavior.\ 


5 Inexact result enable (INEE) 
Initiate an INE exception if the result of a floating arithmetic or conversion opera- 
tion differs from the mathematically exact result. 


4 Underflow enable (UNFE) 
Initiate a UNF exception if a floating arithmetic or conversion operation under- 
flows the destination exponent. 


3 Overflow enable (OVFE) 
Initiate an OVF exception if a floating arithmetic or conversion operation over- 
flows the destination exponent. 


2 Division by zero enable (DZEE) 
Initiate a DZE exception if an attempt is made to perform a floating divide opera- 
tion with a divisor of zero. 


1 Invalid operation enable (INVE) 
Initiate an INV exception if an attempt is made to perform a floating arithmetic, 
conversion, or comparison operation, and one or more of the operand values is 
illegal. 


0 Reserved for implementation software. 


B.3 Mapping to IEEE Standard — 


There are five IEEE exceptions, each of which can be "IEEE software trap-enabled" or dis- 
abled (the default condition). Implementing the IEEE software trap-enabled mode is optional 
in the IEEE standard. 


The assumption, therefore, is that the only access to IEEE-specified software trap-enabled 
results will be generated in assembly language code. The following design allows this, but 
only if such assembly language code has TRAPB instructions after each floating-point instruc- 
tion, and generates the IEEE-specified scaled result in a trap handler by emulating the 
instruction that was trapped by hardware overflow/underflow detection, using the original 
operands. 
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There is a set of detailed IEEE-specified result values, both for operations that are specified to 
raise IEEE traps and those that do not. This behavior is created on Alpha by four layers of 
hardware, PALcode, the operating-system completion handler, and the user signal handler, as 
shown in Figure B-2. . 


Figure B-2: IEEE Trap Handling Behavior 


Hardware 


Traps to PALcode 


PALcode 


Traps to Operating System _ 


Operating System 


Traps to User IEEE Trap Handler 
> (IEEE Standard) 


User Signal Handler 


The IEEE-specified trap behavior occurs only with respect to the user signal handler (the last 
layer in Figure B—2); any trap-and-fixup behavior in the first three layers is outside the scope 
of the IEEE standard. 


















The [EEE number system is divided into finite and non-finite numbers: 
The finites are normal numbers: 


e —MAX..—MIN, —-0, 0, +MIN..4+MAX 
¢ The non-finites are: 


¢ Denormals, +/— Infinity, Signaling NaN, Quiet NaN 


Alpha hardware must treat minus zero operands and results as special cases, as required by the 
IEEE standard. 


If the DNZ (denormal operands to zero) bit in the FPCR is set or if the OS completion handler 
is treating denormal operands as zero, then IEEE trap handling is done as if each denormal 
operand had the corresponding signed zero value. 


Table B—2 specifies, for the IEEE /S qualifier modes, which layer does each piece of trap han- 
dling. The table describes where the hardware and PALcode can trap to the OS completion 
handler. However, for IEEE operations with any valid qualifier combination that includes the 
/S qualifier, the system may choose not to trap to the OS completion handler, provided that 
any applicable exception is disabled by the trap disable bits in the FPCR and the hardware and 
PALcode can produce the expected IEEE result as modified by the denormal control bits in 
the FPCR. See Common Architecture, Chapter 4, for more detail on the hardware instruction 
descriptions. 
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Table B-2: IEEE Floating-Point Trap Handling 


Alpha Instructions 

FBEQ FBNE FBLT FBLE FBGT 
FBGE 

LDS LDT 

STS STT 

CPYS CPYSN 

FCMOVx 


PAL- 
Hardware! Code 


Bits Only — No Exceptions 


Bits Only—No Exceptions 
Bits Only—No Exceptions 
Bits Only—No Exceptions 
Bits Only—No Exceptions 


ADDx SUBx INPUT Exceptions: 


D z 


enormal operand 


+/-Inf operand Trap Trap Supply sum 
QNaN onerand Tran Tran Supply QNaN 
SNaN operand Trap Trap Supply QNaN [Invalid Op] 
+Inf + —Inf Trap Trap Supply QNaN [Invalid Op] 
ADDx SUBx OUTPUT Exceptions: 
Exponent overflow Trap Trap Supply [Overflow?] 
+/—Inf Scale by bias 
| +/-MAX adjust 
Exponent underflow and disabled Supply +0 _ - = 
Exponent underflow and enabled Supply +0 Trap Supply [Underflow>] 
and trap +/-MIN Scale by bias 
denorm adjust 
+/—0 
Inexact and disabled - ~ - - 
Inexact and enabled Supply sum Trap — [Inexact] 
and trap 
MULx INPUT Exceptions: 
Denormal operand Trap Trap Supply prod. (Denormal Op’} 
+/-Inf operand Trap Trap Supply prod. _ 
QNaN operand Trap Trap Supply QNaN-s- 
SNaN operand Trap Trap Supply QNaN [Invalid Op] 
O * Inf Trap Trap Supply QNaN [Invalid Op] 
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Trap Trap 


OS 


User 


Completion Signal 


Handler 


Supply sum 


Handler 


{[Denormal Op*] 
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Alpha Instructions 
MULx OUTPUT Exceptions: 


Exponent overflow 


Exponent underflow and disabled 
Exponent underflow and enabled 


Inexact and disabled 
Inexact and enabled 


DIVx INPUT Exceptions: 
Denormal operand 
+/-Inf operand 
QNaN operand 
SNaN operand 
0/0 or Inf/Inf 
A/O 


DIVx OUTPUT Exceptions: 


Exponent overflow 


Exponent underflow and disabled 
Exponent underflow and enabled 


Inexact and disabled 
Inexact and enabled 


Hardware! 


Trap 


Supply +0 
Supply +0 
and Trap 


Supply prod. 
and trap 


Trap 


Trap 
Trap 
Trap 
Trap 
Trap 


Trap 


Supply +0 
Supply +0 
and trap 


Supply quot. 
and trap 


CMPTEQ CMPTUN INPUT Exceptions: 


Denormal operand 
QNaN operand 


SNaN operand 


Trap 
Trap 


Trap 
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PAL- 
Code 


Trap 


Trap 


Trap 


Trap 


Trap 
Trap 
Trap 
Trap 
Trap 


Trap 


Trap 


Trap 


Trap 
Trap 


Trap 


OS 
Completion 
Handler 


Supply 
+/-Inf 
+/-MAX 


Supply 
+/-MIN 
denorm 
+/-0 


Supply quot. 


Supply quot. 

Supply QNaN 
Supply QNaN 
Supply QNaN 


Supply 
+/— Inf 


Supply 
+/—Inf 
+/—-MAX 


Supply 
+/— MIN 
denorm 
+/—0 


Supply (=) 
Supply False 


for EQ, True 
for UN 


Supply 
False/ True 


User 
Signal 
Handler 


[Overflow?] 
Scale by bias 
adjust 


[Underflow?] 
Scale by bias 
adjust 


[Inexact] 


{[Denormal Op’] 


[Invalid Op] 
{Invalid Op] 
[Div. Zero] 


[Overflow] 
Scale by bias 
adjust 


[Underflow?] 
Scale by bias 
adjust 


[Inexact] 


[Denormal Op’) 


[Invalid Op] 
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Table B-2: IEEE Floating-Point. Trap Handling (Continued) 


OS User 
PAL- Completion Signal 
Alpha Instructions Hardware! Code Handler Handler 
CMPTLT CMPTLE INPUT Exceptions: 
Denormal operand Trap Trap Supply < or < {Denormal Op”) 
QNaN operand Trap Trap Supply False [Invalid Op] 
SNaN operand ; Trap Trap Supply False [Invalid Op] 
CVTfi INPUT Exceptions: 
Denormal operand Trap Trap Supply Cvt [Denormal Op*] 
+/-Inf operand Trap Trap Supply 0 [Invalid Op] 
QNaN operand Trap Trap Supply 0 = 
SNaN operand Trap Trap Supply 0 [Invalid Op] 
CYTES CUTPUT Exceptions: 
Inexact and disabled re — — — 
Inexact and enabled Supply Cvt Trap - [Inexact] 
and trap 
Integer overflow Supply Trunc. Trap — [Invalid Op?] 
result and trap 
if enabled 
CVTif OUTPUT Exceptions: 
Inexact and disabled - ~ _ - 
Inexact and enabled Supply Cvt Trap ~ [Inexact] 
and trap 
CVTff INPUT Exceptions: 
Denormal operand Trap Trap Supply Cvt [Denormal Op7] 
+/-Inf operand Trap Trap Supply Cvt _ 
QNaN operand Trap Trap Supply QNaN — 
SNaN operand Trap Trap Supply QNaN [Invalid Op] 
CVTff OUTPUT Exceptions: 
Exponent overflow Trap Trap Supply [Overflow?] 
| +/-Inf Scale by bias 
Exponent underflow and disabled Supply +0 _ — ~ 
Exponent underflow and enabled Supply +0 Trap Supply (Underflow?] 
denorm adjust 
+/-0 
Inexact and disabled - _ — _ 
Inexact and enabled Supply Cvt Trap - [Inexact] 
and trap 
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Table B-2: IEEE Floating-Point Trap Handling (Continued) 


OS User 
PAL- Completion Signal 

Alpha Instructions Hardware! Code Handler Handler 
SQRTx INPUT Exceptions | 
Negative nonzero operand Trap Trap Supply QNan [Invalid Op] 
+/-0 Supply +/-0 - — - 
+ Denormal operand Trap Trap Supply SQRT —[Denormal Op?| 
— Denormal operand Trap Trap Supply QNaN [Denormal Op/ 

, Invalid Op] 
+ Infinity operand Trap Trap Supply +Inf _ 
— Infinity operand Trap Trap Supply QNaN [Invalid Op] 
QNaN operand Trap Trap Supply QNaN-- 
SNaN operand Trap Trap Supply QNaN [Invalid Op] 
SQRTx OUTPUT Exceptions 
Exponent overflow Not possible 
Exponent underflow Not possible 
Inexact and disabled - ~ - _ 
Inexact and enabled Supply SQRT _ Trap - [Inexact] 


1 This column describes the minimum necessary hardware support. 


[Denormal Op] signals have priority over all other signals. 
[Overflow] and [Underflow] signals have priority over [Inexact] signals. 


bk WwW N 


An implementation could choose instead to trap to PALcode and have the PALcode 
supply a zero result on all underflows. 


An implementation could choose instead to trap to PALcode on extreme values and 
have the PALcode supply a truncated result on all overflows. 


Other IEEE operations (software subroutines or sequences of instructions) are listed here for 
completeness: 

Remainder 

Round float to integer-valued float 

Convert binary to/from decimal 

Compare, other combinations than the four above 
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Table B—3 shows the IEEE standard charts. In the charts, the second column is the result when 
the user signal handler is disabled; the third column is the result when that handler is enabled. 
The OS completion handler supplies the IEEE default that is specified in the second column. 
The contents of the Alpha registers contain sufficient information for an enabled user handler 
to compute the value in the third column. 


Table B-3: IEEE Standard Charts 


User Signal Handler 
Disabled (IEEE Default) 


User Signal Handler 


Exception Enabled (Optional) 


Invalid Operation 
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(1) Input signaling NaN Quiet NaN 

(2) Mag. subtract Inf. Quiet NaN 

(3) O * Inf. Quiet NaN 

(4) 0/0 or Tat/tiat Quiet NaN 

(5) x REM 0 or Inf REM y Quiet NaN 

(6) SQRT(negative non-zero) Quiet NaN 

(7) Cvt to int(ovfl) Low-order bits 

(8) Cvt to int(Inf, NaN) 0 

(9) Compare unordered Quiet NaN 

Division by Zero 

x/O, x finite <>0 +/—Inf 

Overflow 

Round nearest +/-Inf. Res/2**192 or 1536 
Round to zero +/-MAX | Res/2**192 or 1536 
Round to —Inf +MAX/-Inf Res/2** 192 or 1536 
Round to +Inf +Inf/-MAX Res/2**192 or 1536 
‘Underflow 

Underflow 0/denorm Res*2**192 or 1536 
Inexact 

Inexact Rounded Res 


B.4 \Revision History 


Revision 7.0, November10, 1997 

1. Alpha AXP ——> Alpha 

2. Added ECO 103, 104 

3. /Software ——> /S (exception completion) 
4. modifier ——> qualifier 
5 


Footnoted Table B-2 for minimum hardware support 


Revision 6.0, December, 1994 
1. Alpha —> Alpha AXP 
2. Fix Section B.1 and B.2 for A_SRM notes 155.1 and 155.7 (respectively) 
_ 3. Added floating-point control (FP_C) quadword 
4. Corrected Table B-2 and B-3 


Revision 5.0, May 12, 1992 
1. Reconciled TBDs 
2. Changed DRAINT to TRAPB 
3. Converted to SDML 


Revision 4.0, August 21, 1990 
1. Remove input exceptions for -0. This should have been removed in revision 3.0 
2. Typos 
3. Change ‘IEEE user’ to ‘user IEEE’ in section Mapping to IEEE Standard 
4 


Specified T floating point data type for CMP instructions and eliminated ‘+/—Inf oper- 
and’ input exception from these instructions 


Revision 3.0, March 2, 1990 
1. Revise and simplify IEEE trap behavior 


Revision 2.0, October 4, 1989 


1. Initial version\ 
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C.1 


Appendix C 


Instruction Summary 


This appendix summarizes all instructions and opcodes in the Alpha architecture. All values 
are in hexadecimal radix. 


Common Architecture Instruction Summary 


This section summarizes all common Alpha instructions. Table C—1 describes the contents of 


the Format and Opcode columns in Table C-2. 


Table C-1: Instruction Format and Opcode Notation 


Instruction 
Format 


Branch 
Floating- point 


Memory 
Memory/ func code 


Memory/ branch 


Operate 


PALcode 


Format 
Symbol 


Bra 
F-P 


Mem 
Mfc 


Mbr 


Opr 


Pcd 


Opcode 


Notation Meaning 


00 
oo. fff 


oo 
oo. ffff 


o0o.h 


oo.ff 


OO 


oo is the 6-bit opcode field 

oo is the 6-bit opcode field 

fff is the 11-bit function code field 

oo is the 6-bit opcode field 

oo is the 6-bit opcode field 

Siff is the 16-bit function code in the dis- 
placement field 

oo is the 6-bit opcode field 

h is the high-order two bits of the displace- 
ment field 

oo is the 6-bit opcode field 

ff is the 7-bit function code field 

oo is the 6-bit opcode field; the particular 
PALcode instruction is specified in the 
26-bit function code field. 


Table C—2 shows qualifiers for operate format instructions. Qualifiers for IEEE and VAX 
floating-point instructions are shown in Sections C.2 and C.3, respectively. 
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Table C-2: Common Architecture Instructions 


Mnemonic 


ADDF 
ADDG 
ADDL 
ADDL/V 
ADDQ 
ADDQ/V 
ADDS 
ADDT 
AMASK 
AND 
BEQ 

BGE 

BGT 

BIC 

BIS 
BLBC 
BLBS 
BLE 

BLT 

BNE 

BR 

BSR 
CALL_PAL 
CMOVEQ 
CMOVGE 
CMOVGT 
CMOVLBC 
CMOVLBS 
CMOVLE 
CMOVLT 
CMOVNE 
CMPBGE 
CMPEQ 
CMPGEQ 
CMPGLE 
CMPGLT 
CMPLE 
CMPLT 
CMPTEQ 
CMPTLE 
CMPTLT 
CMPTUN 
CMPULE 
CMPULT 
CPYS 
CPYSE 
CPYSN 
CTLZ 
CTPOP 
CTTZ 
CVTDG 
CVTGD 
CVTGF 
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Format 


F-P 
F-P 
Opr 


Opr 


F-P 
F-P 
Opr 
Opr 
Bra 
Bra 
Bra 
Opr 
Opr 
Bra 
Bra 
Bra 
Bra 
Bra 
Bra 
Mbr 
Ped 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 
F-P 
Opr 
Opr 
F-P 
F-P 
F-P 
F-P 
Opr 
Opr 
F-P 
F-P 
F-P 
Opr 
Opr 
Opr 
F-P 
F-P 


Opcode 


15.080 
15.0A0 
10.00 
10.40 
10.20 
10.60 
16.080 
16.0A0 
11.61 
11.00 
39 

3E 

3F 

11.0 
11.20 
38 

3C 

3B 

3A 

3D 


Description 


Add F_floating 
Add G_floating 
Add longword 


Add quadword 


Add S_floating 

Add T_floating 

Architecture mask 

Logical product 

Branch if = zero 

Branch if = zero 

Branch if > zero 

Bit clear 

Logical sum 

Branch if low bit clear 

Branch if low bit set 

Branch if < zero 

Branch if < zero 

Branch if # zero 

Unconditional branch 

Branch to subroutine 

Trap to PALcode 

CMOVE if = zero 

CMOVE if 2 zero 

CMOVE if > zero 

CMOVE if low bit clear 

CMOVE if low bit set 

CMOVE if < zero 

CMOVE if < zero 

CMOVE if # zero 

Compare byte 

Compare signed quadword equal 
Compare G_floating equal 

Compare G_floating less than or equal 
Compare G_floating less than 
Compare signed quadword less than or equal 
Compare signed quadword less than 
Compare T_floating equal 

Compare T_floating less than or equal 
Compare T_floating less than 
Compare T_floating unordered 
Compare unsigned quadword less than or equal 
Compare unsigned quadword less than 
Copy sign 

Copy sign and exponent 

Copy sign negate 

Count leading zero 

Count population 

Count trailing zero 

Convert D_floating to G_floating 
Convert G_floating to D_floating 
Convert G_floating to F_floating 
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Table C-—2: Common Architecture Instructions (Continued) 


Mnemonic 


CVTGQ 
CVTLO 
CVTOF 
CVTQG 
CVTQL 
CVTQS 
CVTQT 
CVTST 
CVTTQ 
CVTTS 
DIVF 
DIVG 
DIVS 
DIVT 

ECB 

EQV 
EXCB 
EXTBL 
EXTLH 
EXTLL 
EXTQH 
EXTQL 
EXTWH 
EXTWL 
FBEQ 
FBGE 
FBGT 
FBLE 
FBLT 
FBNE 
FCMOVEQ 
FCMOVGE 
FCMOVGT 
FCMOVLE 
FCMOVLT 
FCMOVNE 
FETCH 
FETCH_M 
FTOIS 
FTOIT 
IMPLVER 
INSBL 
INSLH 
INSLL 
INSQH 
INSQL 
INSWH 
INSWL 
ITOFF 
ITOFS 
ITOFT 
JMP 

JSR 


Format 


a la-Ba- la Ra- Ra Ba-Ba-Ba-lavla-Ba-Ra- da) 


poke sies ioe ees hes kes kes hehe shes Besar 


ZOz 
oS me 


Opr 


oXeke) 
sod 
Ss SR 


F-P 
F-P 
F-P 
Mbr 
Mbr 


JSR_COROUTINE Mbr 


Opcode 


15.0AF 
17.010 
15.0BC 
15.0BE 
17.030 
16.0BC 
16.0BE 
16.2AC 
16.0AF 
16.0AC 
15.083 
15.0A3 
16.083 
16.0A3 
18.E800 
11.48 
18.0400 
12.06 
12.6A 
12.26 
12.7A 
12.36 
12.5A 
12.16 
a 

36 

37 

a2 

32 

3D 
17.02A 
17.02D 
17.02F 
17.02E 
17.02C 
17.02B 
18.8000 
18.A000 
1C.78 
1C.70 
11.6C 
12.0B 
12.67 
12.2B 
12.77 
12.3B 
12,57 
12.1B 
14.014 
14.004 
14.024 
1A.0 
1A.1 
1A.3 
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Description 


Convert G_floating to quadword 
Convert longword to quadword 
Convert quadword to F_floating 
Convert quadword to G_floating 
Convert quadword to longword 
Convert quadword to S_floating 
Convert quadword to T_floating 
Convert S_floating to T_floating 
Convert T_floating to quadword 
Convert T_floating to S_floating 
Divide F_floating 

Divide G_floating 

Divide S_ floating 

Divide T_floating 

Evict cache block 

Logical equivalence 

Exception barrier 

Extract byte low 

Extract longword high 

Extract longword low 

Extract quadword high 

Extract quadword low 

Extract word high 

Extract word low 

Floating branch if = zero 
Floating branch if = zero 
Floating branch if > zero 
Floating branch if < zero 
Floating branch if < zero 
Floating branch if # zero 
FCMOVE if = zero 

FCMOVE if 2 zero 

FCMOVE if > zero 

FCMOVE if < zero 

FCMOVE if < zero 

FCMOVE if # zero 

Prefetch data 

Prefetch data, modify intent 
Floating to integer move, S_floating 
Floating to integer move, T_floating 
Implementation version 

Insert byte low 

Insert longword high 

Insert longword low 

Insert quadword high 

Insert quadword low 

Insert word high 

Insert word low 

Integer to floating move, F_floating 
Integer to floating move, S_floating 
Integer to floating move, T_floating 
Jump 

Jump to subroutine 

Jump to subroutine return 
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Table C-2: Common Architecture Instructions (Continued) 


Mnemonic 


LDA 
LDAH 
LDBU 
LDWU 
LDF 
LDG 
LDL 
LDL_L 
LDQ 
LDO_L 
LDQ_U 
LDS 

LDT 
MAXSB8 
MAXSW4 
MAXUB8 
MAXUW4 
MB 
MF_FPCR 
MINSB8 
MINSW4 
MINUB8 
MINUW4 
MSKBL 
MSKLH 
MSKLL 
MSKQH 
MSKOL 
MSKWH 
MSKWL 
MT_FPCR 
MULF 
MULG 
MULL 
MULL/V 
MULQ 
MULQ/V 
MULS 
MULT 
ORNOT 
PERR 
PKLB 
PKWB 
RC 

RET 
RPCC 

RS 
S4ADDL 
S4ADDQ 
S4SUBL 
S4SUBQ 
S8ADDL 
S8ADDQ 
S8SUBL 
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Format 


Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Mem 
Opr 
Opr 
Opr 
Opr 
Mfc 
F-P 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
F-P 
F-P 
F-P 
Opr 


Opr 


F-P 

F-P 

Opr 
Opr 
Opr 
Opr 
Mfc 
Mbr 
Mfc 
Mfc 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 
Opr 


Opcode 


08 
09 
0A 
0C 
20 
21 
28 
2A 
29 
2B 


aveoevyu 


18.C000 
18.F000 
10.02 
10.22 
10.0B 
10.2B 
10.12 
10.32 
10.1B 


Description 


Load address 

Load address high 

Load zero-extended byte 

Load zero-extended word 

Load F_floating 

Load G_floating 

Load sign-extended longword 
Load sign-extended longword locked 
Load quadword 

Load quadword locked 

Load unaligned quadword 

Load S_ floating 

Load T_floating 

Vector signed byte maximum 
Vector signed word maximum 
Vector unsigned byte maximum 
Vector unsigned word maximum 
Memory barrier 

Move from FPCR 

Vector signed byte minimum 
Vector signed word minimum 
Vector unsigned byte minimum 
Vector unsigned word minimum 
Mask byte low 

Mask longword high 

Mask longword low 

Mask quadword high 

Mask quadword low 

Mask word high 

Mask word low 

Move to FPCR 

Multiply F_floating 

Multiply G_floating 

Multiply longword 


Multiply quadword 


Multiply S_floating 

Multiply T_floating 

Logical sum with complement 
Pixel error 

Pack longwords to bytes 

Pack words to bytes 

Read and clear 

Return from subroutine 

Read process cycle counter © 
Read and set 

Scaled add longword by 4 
Scaled add quadword by 4 
Scaled subtract longword by 4 
Scaled subtract quadword by 4 
Scaled add longword by 8 
Scaled add quadword by 8 
Scaled subtract longword by 8 


DIGITAL Restricted Distribution 


Table C-—2: Common Architecture Instructions (Continued) 


Mnemonic Format Opcode Description 

S8SUBQ Opr 10.3B Scaled subtract quadword by 8 
SEXTB Opr 1C.00 Sign extend byte 

SEXTW Opr 1C.01 Sign extend word 

SLL Opr 12.39 Shift left logical 

SQRTF F-P 14.08A Square root F_floating 
SQRTG F-P 14.0AA Square root G_floating 
SQRTS F-P 14.08B Square root S_floating 
SQRTT F-P 14.0AB Square root T_floating 
SRA _ Opr 12.3C Shift right arithmetic 

SRL Opr 12.34 Shift right logical 

STB Mem OE Store byte 

STF Mem 24 Store F_floating 

STG Mem 25 Store G_floating 

STS Mem 26 Store S_floating 

STL Mem 2© Store longword 

STL_C Mem 2E Store longword conditional 
STQ Mem 2D Store quadword 

STQ_C Mem 2F Store quadword conditional 
STQ_U Mem OF Store unaligned quadword 
STT Mem 27 Store T_floating 

STW Mem 0D Store word 

SUBF F-P 15.081 Subtract F_floating 

SUBG F-P 15.0A1 Subtract G_floating 

SUBL Opr 10.09 Subtract longword 
SUBL/V 10.49 

SUBQ Opr 10.29 Subtract quadword 
SUBQ/V 10.69 

SUBS F-P 16.081 Subtract S_floating 

SUBT F-P . 16.0A1 Subtract T_floating 
TRAPB Mfc 18.0000 Trap barrier 

UMULH Opr 13.30 Unsigned multiply quadword high 
UNPKBL Opr 1C.35 Unpack bytes to longwords 
UNPKBW Opr 10.34 Unpack bytes to words 
WH64 Mfc 18.F800 Write hint — 64 bytes 
WMB Mfc 18.4400 Write memory barrier 
XOR Opr 11.40 Logical difference 

ZAP Opr 12.30 Zero bytes 

ZAPNOT Opr 12.31 Zero bytes not 
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C.2 IEEE Floating-Point Instructions 
Table C-—3 lists the hexadecimal value of the 11-bit function code field for the IEEE float- 
ing-point instructions, with and without qualifiers. The opcode for the following instructions is 


1616, except for SQRTS and SQRTT, which are opcode 141¢. 


Table C-3: IEEE Floating-Point Instruction Function Codes 


None /C /M /D /U [UC /UM /UD 
ADDS 080 000 040 0CO 180 100 140 1C0 
ADDT OAO 020 060 OEO 1A0 120 160 1E0 


CMPTEQ  0A5 
CMPTLT 0A6 
CMPTLE  0A7 
CMPTUN 0A4 
CVTQS OBC 03C 07C OFC 
CVTQT OBE 03E O7E OFE 


CVTST See helow 


wee Ywaw VY 


CVTTQ See below 
CVTTS OAC 02C 06C OEC 1AC 12C 16C 1EC 


DIVS 083 003 043 0C3 183 103 143 1C3 
DIVT 0A3 023 063 OE3 1A3 i235 163 1E3 
MULS 082 002 042 0C2 182 102 142 1C2 
MULT 0A2 022 062 OE2 1A2 22 162 1E2 


SQRTS O08B OOB 04B OCB 18B 10B 14B 1CB 
SQRTT OAB 02B 06B OEB 1AB 12B 16B 1EB 
SUBS 081 001 041 0C1 181 101 141 1Cl 
SUBT OA1 021 061 OE1 1Al1 121 161 1El 


[SU /SUC /SUM SUD /SUI /SUIC /SUIM /SUID 


ADDS 580 500 540 5CO 780 700 740 7CO 
ADDT SAO 520 560 5E0 7A0 720 760 7EO 
CMPTEQ  5A5 

CMPTLT 5A6 

CMPTLE 5A7 

CMPTUN 5A4 


CVTQS 7BC 73C 771C 7FC 
CVTQT 7BE 73E TTE 7FE 
CVTTS SAC 52C 56C SEC TAC 72C 76C TEC 
DIVS 583 503 543 5C3 783 703 743 7C3 
DIVT 5A3 523 563 5E3 7A3 723 763 7E3 
MULS 582 502 542 5C2 782 702 742 702 


MULT SA2 22 562 5E2 7A2 T22 762 7E2 
SQRTS 58B 50B 54B 5CB 78B 70B 74B 7CB 
SQRTT SAB 52B 56B SEB 7AB 72B 76B 7EB 


SUBS 581 501 541 5Cl 781 701 741 7Cl 
SUBT — SAI 521 561 5El TAIL 721 761 7E1 
None /S 


CVTST 2AC 6AC 
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Table C-3: IEEE Floating-Point Instruction Function Codes (Continued) 


None /C IV IVC ISV ISVC /SVI /SVIC 
CVTTQ OAF 02F 1AF 12F SAF 52F 7AF 72F 


7D VD /SVD /SVID /™M VM /SVM  /SVIM 
CVITQ OFF 1EF SEF EF 06F 16F 56F 16F 


Programming Note: 


To use CMPTxx with software completion trap handling, specify the /SU IEEE trap mode, 
even though an underflow trap is not possible. To use CVTQS or CVTQT with software 
completion trap handling, specify the /SUI IEEE trap mode, even though an underflow 
trap is not possible. 


C.3 VAX Floating-Point Instructions 
Table C—4 lists the hexadecimal value of the 11-bit function code field for the VAX float- 
ing-point instructions. The opcode for the following instructions is 15)¢, except for SQRTF 


and SQRTG, which are opcode 146. 


Table C-4: VAX Floating-Point Instruction Function Codes 


None IC /U [UC IS ISC ISU /SUC 
ADDF 080 000 180 100 480 400 580 500 
CVTDG O9E. . O1E 19E 11E 49E AIE 59E 51E 
ADDG OAO 020 1A0 120 4A0 420 5A0 520 
CMPGEQ_ 0A5 4A5 
CMPGLT 0A6 4A6 
CMPGLE 0A7 4A7 
CVTGF OAC 02C 1AC 12C 4AC 42C 5AC 52C 
CVTGD OAD 02D 1AD 12D 4AD 42D 5AD 52D 
CVTGQ See below 
CVTQF OBC 03C 
CVTQG OBE 03E 
DIVF 083 003 183 103 483 403 583 503 
DIVG 0A3 023 1A3 123 4A3 423 5A3 523 
MULF 082 002 182 102 482 402 582 502 
MULG OA2 022 1A2 122 4A2 422 5A2 522 
SQRTF O8A O0OA 18A 10A 48A 40A 58A 50A 
SQRTG OAA O2A 1AA 12A 4AA 42A SAA 52A 
SUBF 081 001 181 101 481 401 581 501 
SUBG OA1 021 1Al 121 4A1l  . 421 5Al1 521 
None IC IV IVC IS ISC ISV ISVC 
CVTGQ OAF O2F LAF 12F 4AF 42F 5AF 52F 
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C.4 Independent Floating-Point Instructions 


C.5 


Table C—5 lists the hexadecimal value of the 11-bit function code field for the floating-point 
instructions that are not directly tied to IEEE or VAX floating point. The opcode for the fol- 
lowing instructions is 176. 


Table C—5: Independent Floating-Point Instruction Function Codes 
None IV [SV 


CPYS 020 
CPYSE 022 
CPYSN 021 
CVTLQ 010 
CVTQL 030 130 = 530 


FCMOVEQ 02A 
FCMOVGE 02D 
FCMOVGT 02F 


FCMOVLE 028 
FCMOVLT 02C 
MF_FPCR 025 
MT_FPCR 024 


Opcode Summary 


Table C—6 lists all Alpha opcodes from 00 (CALL_PAL) through 3F (BGT). In the table, the 
column headings that appear over the instructions have a granularity of 8,¢. The rows beneath 


the leftmost column supply the individual hex number to resolve that granularity. 


If an instruction column has a 0 (zero) in the right (low) hex digit, replace that 0 with the num- 
ber to the left of the backslash in the leftmost column on the instruction’s row. If an 
instruction column has an 8 in the right (low) hexadecimal digit, replace that 8 with the num- 
ber to the right of the backslash in the leftmost column. 


For example, the third row (2/A) under the 10 column contains the symbol INTS*, represent- 
t 4 


ino all the inteoer we 


Adify “Ud Lhd Asst Ewe 


structions. The oncode for those instructions would then he 12. 


instructions. The opcode for those instructions would then be 12;¢ 
because the 0 in 10 is replaced by the 2 in the leftmost column. Likewise, the third row under 
the 18 column contains the symbol JSR*, representing all jump instructions. The opcode for 
those instructions is 1A because the 8 in the heading is replaced by the number to the right of 
the backslash in the leftmost column. 
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The instruction format is listed under the instruction symbol. The symbols in Table C—6 are 
explained in Table C-—7. 


Table C-6: Opcode Summary 


0/8 
1/9 
2/A 
3/B 
a/c 
5/D 
6/E 


7/F 


00 


PAL* 
(pal) 
Res 


Res 


Res 


Res 


Res 


Res 


08 10 18 20 28 30 38 
LDA INTA* MISC* LDF LDL BR BLBC 
(mem) (op) (mem) (mem) (mem) (br) (br) 
LDAH INTL*  \PAL\ LDG LDQ FBEQ BEQ 
(mem) (op) (mem) (mem) (br) (br) 
LDBU INTS*  JSR* LDS LDL_L FBLT BLT 
(mem) (op) (mem) (mem) (mem) (br) (br) 
LDQ UU INTM* \PAL\ LDT LDQ_L FBLE BLE 
(mem) (op) (mem) (mem) (br) (br) 
LDWU ITFP* FPTI* STF STL BSR BLBS 
(mem) (mem) (mem) (br) (br) 
STW FLTV* = \PAL\ STG STQ FBNE BNE 
(mem) (op) (mem) (mem) (br) (br) 
STB FLTI* \PAL\ STS STL_C  FBGE BGE 
(mem) (op) (mem) (mem) (br) (br) 
STQ_U FLTL* \PAL\ STT STQ_C FBGT BGT 
(mem) (op) (mem) (mem) (br) (br) 


Table C-7: Key to Opcode Summary 


Symbol 


FLTI* 
FLTL* 
FLTV* 
FPTI* 
INTA* 
INTL* 
INTM* 
INTS* 
ITFP* 
JSR* 
MISC* 
PAL* 
\PAL\ 
Res 


Meaning 


IEEE floating-point instruction opcodes 
Floating-point Operate instruction opcodes 
VAX floating-point instruction opcodes 
Floating-point to integer register move opcodes 
Integer arithmetic instruction opcodes 

Integer logical instruction opcodes 

Integer multiply instruction opcodes 

Integer shift instruction opcodes 

Integer to floating-point register move opcodes 
Jump instruction opcodes 

Miscellaneous instruction opcodes 

PALcode instruction (CALL_PAL) opcodes 
Reserved for PALcode 

Reserved for DIGITAL 
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C.6 Common Architecture Opcodes in Numerical Order 


Table C-8: Common Architecture Opcodes in Numerical Order 


Opcode Opcode Opcode 

00 CALL_PAL 11.26 CMOVNE 14.014 ITOFF 

01 OPCO1 11.28 ORNOT 14.024 ITOFT 

02 OPC02 11.40 XOR 14.02A SQRTG/C 

03 OPC03 11.44 CMOVLT 14.02B SQRTT/C 

04 OPC04 — 11.46 CMOVGE 14.04B SQRTS/M 

05 OPC05 11.48 EQV 14.06B SQRTT/M 

06 OPC06 11.61 AMASK 14.08A SQRTF 

07 OPC07 11.64 CMOVLE 14.08B SQRTS 

08 LDA 11.66 CMOVGT 14.0AA SQRTG 

09 LDAH 11.6C IMPLVER 14.0AB SQRTT 

OA LDBU 12.02 MSKBL 14.0CB SQRTS/D 
oe. LB ae eG oS PMTBE 1A OFe Son 

OC LDWU 12.0B INSBL 14.10A SQRTF/UC 
OD STW 12,12 MSKWL 14.10B SQRTS/UC 
OE STB 12.16 EXTWL 1412A  SQRTG/UC 
OF STQ_U 12.1B INSWL 14.12B SQRTT/UC 
10.00 ADDL 12.22 MSKLL 14.14B SQRTS/UM 
10.02 S4ADDL 12.26 EXTLL 14.16B SQRTT/UM 
10.09 SUBL 12.2B INSLL 14.18A SQRTEF/U 
10.0B S4SUBL . 12.30 ZAP 14.18B SQRTS/U 
10.0F CMPBGE 12.31 ZAPNOT 14.1AA SQRTG/U 
10.12 S8ADDL 12.32 MSKQL 14.1AB SQRTT/U 
10.1B S8SUBL 12.34 SRL 14.1CB SQRTS/UD 
10.1D CMPULT 12.36 EXTQL 14.1EB SQRTT/UD 
10.20 ADDQ 12.39 SLL 14.40A SQRTF/SC 
10.22 S4ADDQ 12.3B INSQL 14.42A SQRTG/SC 
10.29 SUBQ 12.3C SRA 14.48A SQRTF/S 
10.2B S4SUBQ 12.52 MSKWH 14.4AA SQRTG/S 
10.2D CMPEQ 12:57 INSWH 14.50A SQRTF/SUC 
10.32 S8ADDQ 12.5A EXTWH 14.50B SQRTS/SUC 
10.3B S8SUBQ 12.62 MSKLH 14.52A  SQRTG/SUC 
10.3D CMPULE 12.67 INSLH 14.52B SQRTT/SUC 
10.40 ADDL/V 12.6A EXTLH 14.54B SQRTS/SUM 
10.49 SUBL/V 12.72 MSKQH 14.56B SQRTT/SUM 
10.4D CMPLT 12.77 INSQH 14.58A SQRTF/SU 
10.60 ADDQ/V 12.7A EXTQH 14.58B SQRTS/SU 
10.69 SUBQ/V 13.00 MULL 14.5AA SQRTG/SU 
10.6D CMPLE . 13.20 MULQ 14.5AB SQRTT/SU 
11.00 AND 13.30 UMULH 14.5CB SQRTS/SUD 
11.08 BIC 13.40 MULL/V 14.5EB SQRTT/SUD 
11.14 CMOVLBS | 13.60 MULQ/V 14.70B SQRTS/SUIC 
11.16 CMOVLBC 14.004 ITOFS 14.72B  SQRTT/SUIC 
11.20 BIS 14.00A SQRTF/C 14.74B SQRTS/SUIM 
11.24. CMOVEQ 14.00B SQRTS/C 14.76B SQRTT/SUIM 
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Table C-8: Common Architecture Opcodes in Numerical Order (Continued) 


Opcode 


14.78B 
14.7AB 
14.7CB 
14.7EB 
15.000 
15.001 
15.002 
15.003 
15.01E 
15.020 
15.021 . 
15.022 
15.023 
15.02C 
15.02D 
15.02F 
15.03C 
15.03E 
15.080 
15.081 
15.082 
15.083 
15.09E 
15.0A0 
15.0A1 
15.0A2 
15.0A3 
15.0A5 
15.0A6 
15.0A7 
15.0AC 
15.0AD 
15.0AF 
15.0BC 
15.0BE 
15.100 
15.101 
15.102 
15.103 
15.11E 
15.120 
15.121 
15.122 
15.123 
15.12C 
15.12D 


SORTS/SUI 
SQRTT/SUI 
SQRTS/SUID 
SQRTT/SUID 
ADDF/C 
SUBF/C 
MULFI/C 
DIVF/C 
CVTDGI/C 
ADDG/C 
SUBG/C 
MULG/C 
DIVG/C 
CVTGF/C 
CVTGD/C 
CVTGQ/C 
CVTQF/C 
CVTOQG/C 
ADDF 
SUBF 
MULF 
DIVE 
CVTDG 
ADDG 
SUBG 
MULG 
DIVG 
CMPGEQ 
CMPGLT 
CMPGLE 
CVTGF 
CVTGD 
CVTGQ 
CVTQF 
CVTQG 
ADDF/UC 
SUBF/UC 
MULF/UC 
DIVF/UC 
CVTDG/UC 
ADDG/UC 
SUBG/UC 
MULG/UC 
DIVG/UC 
CVTGE/UC 
CVTGD/UC 


Opcode 


15.12F 
15.180 
15.181 
15.182 
15.183 
15.19E 
15.1A0 
1S5.1A]1 
15.1A2 
15.1A3 
15.1AC 
15.1AD 
15.1AF 
15.400 
15.401 
15.402 
15.403 
15.41E 
15.420 
15.421 
15.422 
15.423 
15.42C 
15.42D 
15.42F 
15.480 
15.481 
15.482 
15.483 
15.49E 
15.4A0 


— 15.4A1 


15.4A2 
15.4A3 
15.4A5 
15.4A6 
15.4A7 
15.4AC 
15.4AD 
15.4AF 
15.500 
15.501 
15.502 
15.503 
15.51E 
15.520 
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CVTGQ/VC 
ADDF/U 
SUBF/U 
MULF/U 
DIVF/U 
CVTDG/U 
ADDG/U 
SUBG/U 
MULG/U 
DIVG/U 
CVTGF/U 
CVTGD/U 
CVTGQ/V 
ADDF/SC 
SUBF/SC 
MULF/SC 
DIVEF/SC 
CVTDG/SC 
ADDG/SC 
SUBG/SC 
MULG/SC 
DIVG/SC 
CVTGF/SC 
CVTGD/SC 
CVTGQ/SC 
ADDF/S 
SUBF/S 
MULF/S 
DIVF/S 
CVTDG/S 
ADDG/S 
SUBG/S 
MULG/S 
DIVG/S 
CMPGEQ/S 


-~CMPGLT/S 


CMPGLE/S 
CVTGF/S 
CVTGD/S 
CVTGQ/S 
ADDF/SUC 
SUBF/SUC 
MULF/SUC 
DIVF/SUC 
CVTDG/SUC 
ADDG/SUC 


Opcode 

15.521 SUBG/SUC 
15.522 MULG/SUC 
[5.523 DIVG/SUC 
15.52C CVTGF/SUC 
15.52D CVTGD/SUC 
15.52F CVTGQ/SVC 
15.580 ADDF/SU 
15.581 SUBF/SU 
15.582 MULF/SU 
15.583 DIVF/SU 
15.59E CVTDG/SU 
15.5A0 ADDG/SU 
15.5A1 SUBG/SU 
15.5A2 MULG/SU 
15.5A3 DIVG/SU 
15.5AC CVTGF/SU 
15.5AD CVTGD/SU 
15.5AF CVTGQ/SV 
16.000 ADDS/C 
16.001 SUBS/C 
16.002 MULS/C 
16.003 DIVS/C 
16.020 ADDT/C 
16.021 SUBT/C 
16.022 MULT/C 
16.023 DIVT/C 
16.02C CVTTS/C 
16.02F CVTTQ/C 
16.03C CVTQS/C 
16.03E CVTQT/C 
16.040 ADDS/M 
16.041 SUBS/M 
16.042 MULS/M 
16.043 DIVS/M 
16.060 ADDT/M 
16.061 SUBT/M 
16.062 MULT/M 
16.063 DIVT/M 
16.06C CVTTS/M 
16.06F | CVTTQ/M 
16.07C CVTOS/M 
16.07E CVTQT/M 
16.080 ADDS 
16.081 SUBS 
16.082 MULS 
16.083 DIVS 
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Table C-8: Common Architecture Opcodes in Numerical Order (Continued) 


Opcode Opcode Opcode 
16.0A0 ADDT 16.182 MULS/U 16.5A3 DIVT/SU 
16.0A1 SUBT 16.183 DIVS/U 16.5A4 CMPTUN/SU 
16.0A2 MULT . 16.1A0 ADDT/U 16.5A5 CMPTEQ/SU 
16.0A3 DIVT 16.1A1 SUBT/U 16.5A6 CMPTLT/SU 
16.0A4 CMPTUN 16.1A2 MULT/U 16.5A7 CMPTLE/SU 
16.0A5 CMPTEQ 16.1A3 DIVT/U 16.5AC CVTTS/SU 
16.0A6 CMPTLT 16.1AC CVTTS/U 16.5AF CVTTQ/SV 
16.0A7 CMPTLE 16.1AF CVTTQ/V 16.5CO ADDS/SUD 
16.0AC CVTTS 16.1C0 ADDS/UD 16.5C1 SUBS/SUD 
16.0AF CVTTQ 16.1C1 SUBS/UD 16.5C2 MULS/SUD 
16.0BC CVTQS 16.1C2 MULS/UD 16.5C3 DIVS/SUD 
16.0BE CVTQT 16.1C3 DIVS/UD 16.5E0O ADDT/SUD 
16.0C0 ADDS/D 16.1E0 ADDT/UD 16.5E1 SUBT/SUD 
16.0C1  SUBS/D 16.1E1 SUBT/UD 16.5E2 MULT/SUD 
16.0C2 MULS/D 16.1E2 MULT/UD 16.5E3 DIVT/SUD 
16.0C3.— DIVS/D 1l6O.1E3 DIVL/UD 1O.5EC | CV ETS/SUD 
16.0E0 ADDT/D 16.1EC CVTTS/UD 16.5EF CVTTQ/SVD 
16.0E1 SUBT/D 16.1EF CVTTQ/VD 16.6AC CVTST/S 
16.0E2 MULT/D 16.2AC CVTST 16.700 ADDS/SUIC 
16.0E3 DIVT/D 16.500 ADDS/SUC 16.701 SUBS/SUIC 
16.0EC CVTTS/D 16.501 SUBS/SUC 16.702 MULS/SUIC 
16.0EF CVTTQ/D 16.502 MULS/SUC 16.703 DIVS/SUIC 
16.0FC CVTQS/D 16.503 DIVS/SUC 16.720 ADDT/SUIC 
16.0FE CVTQT/D 16.520 ADDT/SUC 16.721 SUBT/SUIC 
16.100 ADDS/UC 16.521 SUBT/SUC 16.722 MULT/SUIC 
16.101 SUBS/UC 16.522 MULT/SUC 16.723 DIVT/SUIC 
16.102 MULS/UC 16.523 DIVT/SUC 16.72C CVTTS/SUIC 
16.103 DIVS/UC 16.52C CVTTS/SUC 16.72F  CVTTQ/SVIC 
16.120 ADDT/UC 16.52F CVTTQ/SVC 16.73C CVTQS/SUIC 
16.121 SUBT/UC 16.540 ADDS/SUM 16.73E CVTQT/SUIC 
16.122 MULT/UC 16.541 SUBS/SUM 16.740 ADDS/SUIM 
16.123 DIVT/UC 16.542 MULS/SUM 16.741 SUBS/SUIM 
16.12C CVTTS/UC 16.543 DIVS/SUM 16.742 MULS/SUIM 
16.12F CVTTQ/VC 16.560 ADDT/SUM 16.743 DIVS/SUIM 
16.140 ADDS/UM 16.561 SUBT/SUM 16.760 ADDT/SUIM 
16.141 SUBS/UM 16.562 MULT/SUM 16.761 SUBT/SUIM 
16.142 MULS/UM 16.563 DIVT/SUM 16.762 MULT/SUIM 
16.143 DIVS/UM 16.56C CVTTS/SUM 16.763 DIVT/SUIM 
16.160 ADDT/UM 16.56F CVTTQ/SVM 16.76C CVTTS/SUIM 
16.161 SUBT/UM 16.580 ADDS/SU 16.76F CVTTQ/SVIM 
16.162 MULT/UM 16.581 SUBS/SU 16.77C CVTQS/SUIM 
16.163 DIVT/UM 16.582 MULS/SU 16.77E CVTQT/SUIM 
16.16C CVTTS/UM 16.583 DIVS/SU 16.780 ADDS/SUI 
16.16F = CVTTQ/VM 16.5A0 ADDT/SU 16.781 SUBS/SUI 
16.180 ADDS/U 16.5A1 SUBT/SU 16.782 MULS/SUI 
16.181 SUBS/U 16.5A2 MULT/SU 16.783 DIVS/SUI 
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Table C-8: Common Architecture Opcodes in Numerical Order (Continued) 


Opcode 


16.7A0 
16.7A1 
16.7A2 
16.7A3 
16.7AC 
16.7AF 
16.7BC 
16.7BE 
16.7CO 
16.7C1 
16.7C2 
16.7C3 
16.7E0 
16.7E1 
16.7E2 
16.7E3 
16.7EC 
16.7EF 
16.7FC 
16.7FE 
17.010 
17.020 
17.021 
17.022 
17.024 
17.025 
17.02A 
17.02B 
17.02C 
17.02D 
17.02E 
17.02F 
17.030 
17.130 
17.530 
18.0000 
18.0400 


ADDT/SUI 
SUBT/SUI 
MULT/SUI 
DIVT/SUI 
CVTTS/SUI 
CVTTQ/SVI 
CVTQS/SUI 
CVTQT/SUI 
ADDS/SUID 
SUBS/SUID 
MULS/SUID 
DIVS/SUID 
ADDT/SUID 
SUBT/SUID 
MULT/SUID 
DIVT/SUID 
CVTTS/SUID 
CVTTQ/SVID 
CVTQS/SUID 
CVTQT/SUID 
CVTLQ 
CPYS 
CPYSN 
CPYSE 
MT_FPCR 
MEF_FPCR 
FCMOVEQ 
FCMOVNE 
FCMOVLT 
FCMOVGE 
FCMOVLE 
FCMOVGT 
CVTQL 
CVTQL/V 
CVTQL/SV 
TRAPB 
EXCB 


Opcode 


18.4000 
18.4400 
18.8000 
18.A000 
18.C000 
18.E000 
18.E800 
18.F000 
18.F800 
19 

1A.0 
1A.1 
1A.2 
1A.3 

1B 
1C.00 
1C.01 
1€.30 
1.31 
1.32 
1€.33 
1.34 
1€.35 
1C.36 
1C.37 
1C.38 
1€.39 
1C.3A 
1C.3B 
1C.3C 
1C.3D 
1C.3E 
1C.3F 
10.70 
1C.78 
1D 

1E 
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MB 

WMB 
FETCH 
FETCH_M 
RPCC 

RC 

ECB 

RS 

WH64 
PAL19 
JMP 

JSR 

RET 
JSR_COROUTINE 
PALIB 
SEXTB 
SEXTW 
CTPOP 
PERR 
CTLZ 
CTTZ 
UNPKBW 
UNPKBL 
PKWB 
PKLB — 
MINSB8 
MINSW4 
MINUB8 
MINUW4 
MAXUB8 
MAXUW4 


MAXSB8 


MAXSW4 
FTOIT 
FTOIS 
PAL1D 
PALIE 


Opcode 
1F PALIF 
20 LDF 
21 LDG 
22 LDS 
23 LDT 
24 STF 
25 STG 
26 STS 
27 STT 
28 LDL 
29 LDQ 
2A LDL_L 
2B LDQ_L 
2c STL 
2D STO 
2E STL_C 
2F STQ_C 
30 BR 
31 FBEQ 
52 FBLT 
33 FBLE 
34 BSR 
35 FBNE 
36 FBGE 
37 FBGT 
38 BLBC 
39 BEQ 
3A BLT 
3B . BLE 
3C BLBS 
3D BNE 
3E BGE 
3F BGT 
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C.7 OpenVMS Alpha PALcode Instruction Summary 


Table C-9: OpenVMS Alpha Unprivileged PALcode Instructions 


Mnemonic 


AMOVRM 
AMOVRR 
BPT 
BUGCHK 
CHMK 
CHME 
CHMS 
CHMU 
CLRFEN 
GENTRAP 
IMB 
INSQHIL 


INS MUTT D 


SQHIER 
INSOHIQ 
INSQHIQR 
INSQTIL 
INSQTILR 
INSOTIO 
INSOTIOR 
INSQUEL 
INSQUEL/D 
INSQUEQ 
INSQUEQ/D 
PROBER 
PROBEW 
RD_PS 
READ_UNQ 
REI 
REMQHIL 
REMOHILR 
REMOHIQ 
REMQHIOR 
REMOTIL 
REMOTILR 
REMOTIO 
REMOTIOR 
REMQUEL 
REMQUEL/D 
REMQUEQ 
REMQUEQ/D 
RSCC 
SWASTEN 
WRITE_UNQ 
WR_PS_SW 
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Opcode 


00.00A1 
00.00A0 
00.0080 
00.0081 
00.0083 
00.0082 
00.0084 
00.0085 
00.00AE 
00.00AA 
00.0086 
00.0087 


NAN ANAYD 


VU UULALEe 


00.0089 
00.00A4 
00.0088 

00.00A3 
00.008A 
00.00A5 


00.008B © 


00.008D 
00.008C 
00.008E 
00.008F 
00.0090 
00.0091 
00.009E 
00.0092 
00.0093 
00.00A6 
00.0095 
00.00A8 
00.0094 
00.00A7 
00.0096 
00.00A9 
00.0097 
00.0099 
00.0098 
00.009A 
00.009D 
00.009B 
00.009F 
00.009C 


Description 


Atomic move from register to memory 
Atomic move from register to register 
Breakpoint 

Bugcheck 

Change mode to kernel 

Change mode to executive 

Change mode to supervisor 

Change mode to user 

Clear floating-point enable 

Generate software trap 

I-stream memory barrier 

Insert into longword queue at head interlocked 


Trnaart inta la ya at hand interlocked resident 
BLIGULL tite; 100g VY ore qucu’ CAL AAW LEW 2D AWUWeAae 


Insert into quadword queue at head interlocked 

Insert into quadword queue at head interlocked resident 
Insert into longword queue at tail interlocked 

Insert into longword queue at tail interlocked resident 
Insert into quadword queue at tail interlocked 

Insert into quadword queue at tail interlockedresident 
Insert entry into longword queue 

Insert entry into longword queue deferred 

Insert entry into quadword queue 

Insert entry into quadword queue deferred 

Probe for read access 

Probe for write access 

Move processor status 

Read unique context 

Return from exception or interrupt 

Remove from longword queue at head interlocked 
Remove from longword queue at head interlocked resident 
Remove from quadword queue at head interlocked 
Remove from quadword queue at head interlocked resident 
Remove from longword queue at tail interlocked 
Remove from longword queue at tail interlocked resident 
Remove from quadword queue at tail interlocked 
Remove from quadword queue at tail interlocked resident 
Remove entry from longword queue 

Remove entry from longword queue deferred 

Remove entry from quadword queue 

Remove entry from quadword queue deferred 

Read system cycle counter 

Swap AST enable for current mode 

Write unique context 

Write processor status software field 
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Table C-10: OpenVMS Alpha Privileged PALcode Instructions 


Mnemonic 


CFLUSH 
CSERVE 
DRAINA 
HALT 

LDQP 
MFPR_ASN 
MFPR_ESP 
MFPR_FEN 
MFPR_IPL 
MFPR_MCES 
MFPR_PCBB 
MFPR_PRBR 
MFPR_PTBR 
MFPR_SCBB 
MFPR_SISR 
MFPR_SSP 
MFPR_TBCHK 
MFPR_USP 
MFPR_VPTB 
MFPR_WHAMI 
MTPR_ASTEN 
MTPR_ASTSR 
MTPR_DATFX 
MTPR_ESP 
MTPR_FEN 
MTPR_IPIR 
MTPR_IPL 
MTPR_MCES 
MTPR_PERFMON 
MTPR_PRBR 
MTPR_SCBB 
MTPR_SIRR 
MTPR_SSP 
MTPR_TBIA 
MTPR_TBIAP 
MTPR_TBIS 
MTPR_TBISD 
MTPR_TBISI 
MTPR_USP 
MTPR_VPTB 
STQP 
SWPCTX 
SWPPAL 
WTINT 


Opcode 


00.0001 
00.0009 
00.0002 
00.0000 
00.0003 
00.0006 
00.001E 
00.000B 
00.000E 
00.0010 
00.0012 
00.0013 
00.0015 
00.0016 
00.0019 
00.0020 
00.001A 
00.0022 
00.0029 
00.003F 
00.0026 
00.0027 
00.002E 
00.001F 
00.000B 
00.000D 
00.000E 
00.0011 
00.002B 
00.0014 
00.0017 
00.0018 
00.0021 
00.001B 
00.001C 
00.001D 
00.0024 
00.0025 
00.0023 
00.002A 
00.0004 
00.0005 
00.000A 
00.003E 
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Description 


Cache flush 

Console service 

Drain aborts 

Halt processor 

Load quadword physical. 

Move from processor register ASN 
Move from processor register ESP 
Move from processor register FEN 
Move from processor register IPL 
Move from processor register MCES 
Move from processor register PCBB 
Move from processor register PRBR 
Move from processor register PTBR 
Move from processor register SCBB 
Move from processor register SISR 
Move from processor register SSP 
Move from processor register TBCHK 
Move from processor register USP 
Move from processor register VPTB 


- Move from processor register WHAMI 


Move to processor register ASTEN 
Move to processor register ASTSR 
Move to processor register DATFX 
Move to processor register ESP 
Move to processor register FEN 
Move to processor register IPRI 
Move to processor register IPL 
Move to processor register MCES 
Move to processor register PERFMON 
Move to processor register PRBR 
Move to processor register SCBB 
Move to processor register SIRR 
Move to processor register SSP 
Move to processor register TBIA 
Move to processor register TBIAP 
Move to processor register TBIS 
Move to processor register TBISD 
Move to processor register TBISI 
Move to processor register USP 
Move to processor register VPTB 
Store quadword physical 

Swap privileged context 

Swap PALcode image 

Wait for interrupt 
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C.8 DIGITAL UNIX PALcode Instruction Summary 


Table C-11: DIGITAL UNIX Unprivileged PALcode Instructions 


Mnemonic 


bpt 
bugchk 
callsys 
clrfen 
gentrap 
imb 
rdunique 
urti 
wrunique 


Table C-12: 


Mnemonic 


cflush 
cserve 
draina 
halt 
rdmces 
rdps 
rdusp 
rdval 
retsys 
rti 
swpctx 
swpipl 
swppal 
tbi 
whami 
wrent 
wrfen 
wripir 
wrkgp 
wrmces 
wrperfmon 
wrusp 
wrval 
wrvptptr 
wtint 
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Opcode 


00.0080 
00.0081 
00.0083 
00.00AE | 
00.00AA 
00.0086 


-00.009E 


00.0092 
00.009F 


00.0000 
00.0010 
00.0036 
00.003A 
00.0032 
00.003D 
00.003F 
00.0030 
00.0035 
00.000A 
00.0033 
00.003C 
00.0034 
00.002B 
00.0037 
00.0011 
00.0039 
00.0038 
00.003 1 
00.002D 
00.003E 


Description 


Breakpoint trap 

Bugcheck 

System call 

Clear floating-point enable 
Generate software trap 
I-stream memory barrier 
Read unique value 

Return from user mode trap 
Write unique value 


cade Instructions 


Description 


Cache flush 

Console service 

Drain aborts 

Halt the processor 

Read machine check error summary register 
Read processor status 

Read user stack pointer 

Read system value 

Return from system call 

Return from trap or interrupt 

Swap privileged context 

Swap interrupt priority level 

Swap PALcode image 

Translation buffer invalidate 

Who am I 

Write system entry address 

Write floating-point enable 

Write interprocessor interrupt request 

Write kernel global pointer 

Write machine check error summary register 


~ Performance monitoring function 


Write user stack pointer 

Write system value 

Write virtual page table pointer 
Wait for interrupt 
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C.9 Windows NT Alpha Instruction Summary 


Table C-13: Windows NT Alpha Unprivileged PALcode Instructions 


Mnemonic 


bpt 
callkd 
callsys 
gentrap 
imb 
kbpt 
rdteb 


Opcode 


00.0080 
00.00AD 
00.0083 
00.00AA 
00.0086 
00.00AC 
00.00AB 


Description 


Breakpoint trap 

Call kernel debugger 

Call system service 

Generate trap 

Instruction memory barrier 

Kernel breakpoint trap 

Read TEB internal processor register 


Table C-14: Windows NT Alpha Privileged PALcode instructions 


Mnemonic 


csir 
dalnfix 
di 
draina 
dtbis 
ealnfix 
el 

halt 
initpal 
initpcr 
rdcounters 
rdirql 
rdksp 
rdmces 
rdpcr 
rdpsr 
rdstate 
rdthread 
reboot 
restart 
retsys 
rfe 
swpirql 
swpksp 
swppal 
swpprocess 
swpctx 
ssir 

tbia 
tbim 
tbimasn 
tbis 
tbisasn 
wrentry 
wrmces 
wrperfmon 


Opcode 


00.000D 
00.0025 
00.0008 
00.0002 
00.0016 
00.0024 
00.0009 
00.0000 
00.0004 
00.0038 
00.0030 
00.0007 
00.0018 
00.0012 
00.001C 
00.001A 
00.003 1 
00.001E 
00.0002 
00.0001 
00.000F 
00.000E 
00.0006 
00.0019 
00.000A 
00.0011 
00.0010 
00.000C 
00.0014 
00.0020 
00.0021 
00.0015 
00.0017 
00.0005 
00.0013 
00.0032 
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Description 


Clear software interrupt request 
Disable alignment fixups 

Disable interrupts 

Drain aborts 

Data translation buffer invalidate single 


- Enable alignment fixups 


Enable interrupts 

Trap to illegal instruction 

Initialize the PALcode 

Initialize processor control region data 
Read PALcode event counters 

Read current IRQL 

Read initial kernel stack 

Read machine check error summary 
Read PCR (processor control registers) 
Read processor status register 

Read internal processor state 

Read the current thread value 

Transfer to console firmware 

Restart the processor 

Return from system service call 
Return from exception 

Swap IRQL 

Swap initial kernel stack 

Swap PALcode 

Swap privileged process context 

Swap privileged thread context 

Set software interrupt request 
Translation buffer invalidate all 
Translation buffer invalidate multiple 
Translation buffer invalidate multiple ASN 
Translation buffer invalidate single 
Translation buffer invalidate single ASN 
Write system entry 

Write machine check error summary 
Write performance monitoring values 
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C.10 PALcode Opcodes in Numerical Order 


Opcodes 00.003816 through 00.003F16 are reserved for processor implementation-specific 
PALcode instructions. All other opcodes are reserved for use by DIGITAL. 


Table C-15: PALcode Opcodes in Numerical Order 


Opcodej¢ 


00.0000 
00.0001 
00.0002 
00.0003 
00.0004 
00.0005 
00.0006 


00.0007 
00.0008 


00.0009 
00.000A 
00.000B 
00.000C 
00.000D 
00.000E 
00.000F 
00.0010 
00.0011 
00.0012 
00.0013 
00.0014 
00.0015 
00.0016 
00.0017 
00.0018 
00.0019 
00.001A 
00.001B 
00.001C 
00.001D 
00.001E 
00.001F 
00.0020 
00.0021 
00.0022 
00.0023 
00.0024 
00.0025 
00.0026 
00.0027 


00.0029 
00.0024 


Ve ate 


00.002B 
00.002D 
00.002E 
00.0030 
00.0031 
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Opcode,9 


00.0000 
00.0001 
00.0002 
00.0003 
00.0004 
00.0005 
00.0006 
00.0007 
00.0008 
00.0009 
00.0010 
00.0011 
00.0012 
00.0013 
00.0014 
00.0015 
00.0016 
00.0017 
00.0018 
00.0019 
00.0020 
00.0021 
00.0022 
00.0023 
00.0024 
00.0025 
00.0026 
00.0027 
00.0028 
00.0029 
00.0030 
00.0031 
00.0032 
00.0033 
00.0034 
00.0035 
00.0036 
00.0037 
00.0038 
00.0039 
00.0041 
00.0043 
00.0045 
00.0046 
00.0048 
00.0049 


OpenVMS Alpha 


HALT 
CFLUSH 
DRAINA 
LDQP 

STOP 
SWPCTX 
MFPR_ASN 
MTPR_ASTEN 
MTPR_ASTSR 
CSERVE 
SWPPAL 
MFPR_FEN 
MTPR_FEN 
MTPR_IPIR 
MFPR_IPL . 
MTPR_IPL 
MFPR_MCES 
MTPR_MCES 
MFPR_PCBB 
MFPR_PRBR 
MTPR_PRBR 
MFPR_PTBR 
MFPR_SCBB 
MTPR_SCBB 
MTPR_SIRR 
MFPR_SISR 
MFPR_TBCHK 
MTPR_TBIA 
MTPR_TBIAP 
MTPR_TBIS 
MFPR_ESP 
MTPR_ESP 
MFPR_SSP 
MTPR_SSP 
MFPR_USP 
MTPR_USP 
MTPR_TBISD 
MTPR_TBISI 
MFPR_ASTEN 
MFPR_ASTSR 
MFPR_VPTB 
MTPR_VPTB 
MTPR_PERFMON 


MTPR_DATFX 


DIGITAL 


UNIX 


halt 
cflush 
draina 


cserve 
swppal 


wripir 
rdmces 
wrmces 


wrfen 
wrvptptr 
swpctx 
wrval 


Windows NT 
Alpha 


halt 
restart 
draina 
reboot 
initpal 
wrentry 
swpirql 
rdirql 
di 

el 
swppal 
SSIL 
csir 

rfe 
retsys 
Swpctx 
swpprocess 
rdmes 
wrmces 
tbia 
tbis 
dtbis 
tbisasn 
rdksp 
swpksp 
rdpsr 


rdpcr 
rdthread 


tbim 
tbimasn 


ealnfix 
dalnfix 


rdcounters 
rdstate 
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Table C-15: PALcode Opcodes in Numerical Order (Continued) 


DIGITAL Restricted Distribution 


DIGITAL Windows NT 
Opcode;, Opcodeyg OpenVMS Alpha UNIX Alpha 
00.0032 00.0050 — rdval wrperfmon 
00.0033 00.0051 — tbi — 
00.0034 00.0052 — wrent a 
00.0035 00.0053 — swpipl — 
00.0036 00.0054 — rdps — 
00.0037 00.0055 — wrkgp initpcr 
00.0038 00.0056 — wrusp — 
00.0039 00.0057 — wrperfmon oe 
00.003A 00.0058 — rdusp — 
00.003C 00.0060 — whami — 
00.003D 00.0061 — retsys — 
00.003E 00.0062 WTINT wtint — 
00.003F 00.0063 MFPR_WHAMI rti — 
00.0080 00.0128 BPT bpt bpt 
00.008 1 00.0129 BUGCHK bugchk — 
00.0082 00.0130 CHME — — 
00.0083 00.0131 CHMK callsys callsys 
00.0084 00.0132 CHMS — — 
00.0085 00.0133 CHMU — — 
00.0086 00.0134 IMB imb imb 
00.0087 00.0135 INSQHIL — — 
00.0088 00.0136 INSQTIL — — 
00.0089 00.0137 INSQHIQ — — 
00.008A 00.0138 INSQTIQ — — 
00.008B 00.0139 INSQUEL — — 
00.008C 00.0140 INSQUEQ — — 
00.008D 00.0141 INSQUEL/D — — 
00.008E 00.0142 INSQUEQ/D — — 
00.008F 00.0143 PROBER — — 
00.0090 00.0144 PROBEW — — 
00.0091 00.0145 RD_PS — — 
00.0092 00.0146 REI urti — 
00.0093 00.0147 REMQHIL — — 
00.0094 00.0148 REMQTIL — — 
00.0095 00.0149 REMQHIQ — —— 
00.0096 00.0150 REMQTIQ — — 
00.0097 00.0151 REMQUEL — — 
00.0098 00.0152 - REMQUEQ mo — 
00.0099 00.0153 REMQUEL/D — 
00.009A 00.0154 REMQUEQ/D — — 
00.009B 00.0155 SWASTEN —- — 
00.009C 00.0156 WR_PS_SW —_ — 
00.009D 00.0157 RSCC — — 
00.009E 00.0158 READ_UNQ rdunique — 
00.009F 00.0159 WRITE_UNQ wrunique — 
00.00A0 00.0160 AMOVRR — . — 

_ 00.00A1 00.0161 AMOVRM a ve 
00.00A2 00.0162 © INSQHILR — — 
00.00A3 00.0163 INSQTILR — — 
00.00A4 00.0164 INSQHIQR — — 
00.00A5 00.0165 INSQTIQR — — 
00.00A6 00.0166 REMQHILR — — 
00.00A7 00.0167 REMQTILR — — 
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Table C-15: PALcode Opcodes in Numerical Order (Continued) 


Opcode,. 


00.00A8 

00.00A9 

00.00AA 
00.00AB 
00.00AC 
00.00AD 
00.00AE 


DIGITAL Windows NT 


Opcode;y OpenVMS Alpha UNIX Alpha 


00.0168 REMQHIQR = = 
00.0169 REMOTIOR = ae 


00.0170 GENTRAP gentrap gentrap 
00.0171 — oe rdteb 
00.0172 — — kbpt 
00.0173 — ; — callkd 
00.0174 CLRFEN clrfen 


~C.11 Required PALcode Opcodes 


The opcodes listed in Table C—16 are required for all Alpha implementations. The notation 
used is oo.ffff, where oo is the hexadecimal 6-bit opcode and ffff is the hexadecimal 26-bit 


function code. 


Table C-16: Required PALcode Opcodes 


Mnemonic 
DRAINA 
HALT 

IMB 


Type 
Privileged 
Privileged 
Unprivileged 


Opcode 
00.0002 
00.0000 
00.0086 


C.12 Opcodes Reserved to PALcode 


The opcodes listed in Table C—17 are reserved for use in implementing PALcode. 


Table C-17: ‘Opcodes Reserved for PALcode 
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19 
1E 


Mnamanic Mnamanic 
~ wWaAssaw 


AVA BAWREABYWVARZE AVASAW ARE a. 


PALIB 1B PAL1D 1D 
PALIF 1F 
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C.13 Opcodes Reserved to DIGITAL 


The opcodes listed in Table C—-18 are reserved to DIGITAL. 


Table C-18: Opcodes Reserved for DIGITAL 


Mnemonic Mnemonic Mnemonic 
OPCO1 01 OPC02 02 OPC03 03 
OPC04 04 -OPCO05 05 OPC06 06 
OPCO07 07 

Programming Note: 


The code points 18.4800 and 18.4C00 are reserved for adding weaker memory barrier 
instructions. Those code points must operate as a Memory Barrier instruction (MB 
18.4000) for implementations that precede their definition as weaker memory barrier 
instructions. Software must use the 18.4000 code point for MB. 


\Opcodes 02 and 06 are nominally reserved for future extensions to octaword load/store 
for both integer and floating-point formats. 


For IEEE floating-point opcode 16,6, if the function code field bits<5:4> are 012 or the 


function code bits<3:0> are 11012, an illegal instruction trap is taken. This allows for 
future additions of the extended IEEE format.\ 


C.14 Unused Function Code Behavior 


Unused function codes for all opcodes assigned (not reserved) in the Version 5 Alpha architec- 
ture specification (May 1992) produce UNPREDICTABLE but not UNDEFINED results; 
they are not security holes. 


Unused function codes for opcodes defined as reserved in the Version 5 Alpha architecture 
specification produce an illegal instruction trap. Those opcodes are 01, 02, 03, 04, 05, 06, 07, 
OA, OC, OD, OE, 14, 19, 1B, 1C, 1D, 1E, and 1F. Unused function codes for those opcodes 
reserved to PALcode produce an illegal instruction trap only if not used in the PALcode 
environment. 
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C.15 ASCII Character Set 


Table C—19 shows the 7-bit ASCII character set and the corresponding hexadecimal value for 
each character. 


Table C-19: ASCII Character Set 


Hex Hex Hex Hex 

Char Code Char Code Char Code Char Code 
NUL 0 SP 20 @ 40 60 
SQH 1 ! 21 A Al a 61 
STX 2 Y yi. B 42 b 62 
ETX 3 # 23 Cc 43 c 63 
EOT 4 $ 24 D 44 d 64 
ENQ 5 % 25 E 45 e 65 

ACK 6 & 26 F AG f 66 
BEL 7 27 G 47 g 67 
BS 8 ( 28 H 48 h 68 
HT 9 ) 29 I 49 i 69 
LF A * 2A J 4A j 6A 
VT B + 2B K 4B k 6B 
FF C 2C L 4C 1 6C 
CR D : 2D M 4D m 6D 
SO E 2E N 4E n 6E 
SI F / 2F O 4F o 6F 
DLE 10 0 30 P 50 p 70 
pc1 11 1 31 Q 51 q 71 
pc2.—s« 12 2 32 R 52 r 72 
DC3.—s 13 3 33 S 53 s 73 
pc4—s 14 4 34 T 54 t 74 
NAK 15 5 35 U 55 u 75 
SYN 16 6 36 Vv 56 V 76 
ETB 17 7 37 WwW 57 w i] 
CAN 18 8 38 x 58 x 78 
EM 19 9 39 Y 59 y 79 
SUB 1A 3A Z 5A z TA 
ESC 1B 3B [ 5B { 7B 
FS iC < 3C \ 5C | 7C 
GS 1D = 3D ] 5D } 7D 
RS 1E > 3E A 5E as TE 
US 1F 2 


7 3E 5E DEL 7F 
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C.16 \Revision History 


Revision 7.0, November10, 1997 


1. 


owe NAAR WN 


Se = = — — 
An BB WO NO KK CO 


Alpha AXP ——> Alpha 

DEC OSF/1 —-> DIGITAL UNIX 

OpenVMS AXP ——> OpenVMS Alpha 

Windows NT AXP —> Windows NT Alpha 

Added eco 97, Programming note for WMB 

Added eco 81, LDBU, LDWU, STB, STW, SEXTB, SEXTW instructions 
Added eco 84, VAX and IEEE SQRT instructions 

Added eco 91, wtint/WTINT instruction 

Added eco 87, CTLZ, CTPOP, CTTZ instructions 


. Added eco 94, AMASK and IMPLVER instructions 
. Added eco 96, WH64 and ECB instructions 

. Removed 1C and 14 from the reserved opcode list 

. Added eco 92, urti instruction 

. Added eco 101, clrfen/CLRFEN instruction 

. Added ECO 88, f-p ——> i; i——> f-p moves 

16. 


Added ECO 90, Multimedia (graphics and video) instructions 


Revision 6.0, December 1994 


1. 


SOS OOS. Se I ee 


oy 
>) 


11. 


Added ECO 62, reserved opcodes 14 and 1C 

Added W-NT PALcode 

Alpha —-> Alpha AXP 

Updated the PALcode opcode function code lists for all the ECOs 
Added ECO 61, Unused Opcode function codes text 

Fixed INTS from ‘subtract’ to ‘shift’ 

Added EXCB, CVTST, and CVTGQ instructions 

Added CSERVE and SWPPAL OpenVMS PALcodes 


Added cflush, cserve, rdmces, swppal, wripir, wrmces, wrperfmon DEC OSF/1 PAL- 
codes 


. Created new appendix from previous Appendix C and various tables from the Alpha 


AXP Programming Quick Reference 
Added programming note about CVTQx using /SUI qualifier instead of /SI 


Revision 5.0, May 12, 1992 


1. 
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Added note on IEEE floating-point code 16, special function code fields 
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Added DRAINA to list of required PALcode instructions 

Added ECO #17, #23 

Converted to SDML 

Removed /S and /SC opcodes from CVTQF and CVTQG instructions encodings 
Corrected text by removing extra ‘instructions’ from Fig. C-3 text 

Added CMPBGE to Operate format instruction encoding 

Add opcode for READ_UNQ and WRITE_UNQ 


Oi ON, NS, Fee as ODS. 


Revision 4.0, March 29, 1991 
1. Changed /P to /D 
Added RSCC opcode 
Added Scaled Add/Subtract opcodes 
Removed references to D_float 


Updated various opcodes per EV-4 request 


a 


Typos 


Revision 3.0, March 2, 1990 
1. Version 3.0 update 


Revision 2.0, October 4, 1989 
1. First Pass\ 
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Appendix D 


Registered System and Processor Identifiers 


This appendix contains a table of the processor type assignments, PALcode implementation 
information, and the architecture mask (AMASK) and implementation value (IMPLVER) 
assignments. \System identification and SMM values are located in the TAL- 
LIS::ALPHA_SRM notesfile.\ 


D.1 Processor Type Assignments 


The following processor types are defined. 


Table D-1: Processor Type Assignments 


Major Type Minor Type 
1= £-EV3 
2= EV4 (21064) 0= Pass 2 or 2.1 
l= Pass 3 (also EV4s) 
3= Simulation 
4= LCA Family: 
LCAA4s (21066) 
LCA4s embedded (21068) 
LCA45 (21066A, 21068A) 
0= Reserved 
iS Pass 1 or 1.1 (21066) 
ZS Pass 2 (21066) 
= Pass 1 or 1.1 (21068) 
4= Pass 2 (21068) 
5= Pass 1 (21066A) 
6= Pass 1 (21068A) 
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Table D-1: Processor Type Assignments (Continued) 


Major Type 


5= 


EVS (21164) 


EV45 (21064A) 


EV56 (21164A) 


EV6 (21264) 


PCAS6 (21164PC) 


Minor Type 
O= Reserved (Pass 1) 
= Pass 2, 2.2 (rev BA, CA) 
= Pass 2.3 (rev DA, EA) 
3= Pass 3 
4= Pass 3.2 
5= Pass 4 
0= Reserved 
l= Pass | 
2= Pass 1.1 
3= Pass 2 
0= Reserved 
l= Pass 1 
2= Pass 2. 
0= Reserved 
= Pass 1 
0= Reserved 
l= Pass 1 


For OpenVMS Alpha and DIGITAL UNIX, the processor types are stored in the Per-CPU Slot 
Table (SLOT[176]), pointed to by HWRPB[160]. 


D.2 PALcode Variation Assignments 


The PALcode variation assignments are as follows: 


Table D-2: PALcode Variation Assignments 


Token 
0 

1 

2 

3-127 
128-255 


D-2 Appendixes 


PALcode Type 
Console 

OpenVMS Alpha 
DIGITAL UNIX 
Reserved to DIGITAL 


Reserved to non-DIGITAL 


Summary Table 


N/A 
Console Interface (III), Chapter 3 
Console Interface (III), Chapter 3 — 
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D.3 Architecture Mask and Implementation Values 


\Future mask and implementation values will be located in the TALLIS::ALPHA_SRM 


notesfile.\ 


The following bits are defined for the AMASK instruction. 


Table D-3: AMASK Bit Assignments 


Bit 
0 


Meaning 


Support for the byte/word extension (BWX) 
The instructions that comprise the BWX extension are LDBU, LDWU, SEXTB, 
SEXTW, STB, and STW. 


Support for the square-root and floating-point convert extension (FIX) 
The instructions that comprise the FIX extension are FTOIS, FTOIT, ITOFF, 
ITOFS, ITOFT, SQRTF, SQRTG, SQRTS, and SQRTT. 


Support for the count extension (CIX) 
The instructions that comprise the CIX extension are CTLZ, CTPOP, and CTTZ. 


Support for the multimedia extension (MVI) 

The instructions that comprise the MVI extension are MAXSB8, MAXSW4, 
MAXUB8, MAXUW4, MINSB8, MINSW4, MINUB8, MINUW4, PERR, PKLB, 
PKWB, UNPKBL, and UNPKBW. 


Support for precise arithmetic trap reporting in hardware. The trap PC is the same 
as the instruction PC after the trapping instruction is executed. 


The following values are defined for the IMPLVER instruction. 


Table D-4: IMPLVER Value Assignments 


Value 


0 


Meaning 


21064 (EV4) 


21064A (EV45) 
21066A/21068A (LCA45) 


21164 (EVS) 
21164A (EV56) 
21164PC (PCAS6) 


21264 (EV6) 
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D.4 \Revision History 


Revision 7.0, November10, 1997 
1. Alpha AXP —> Alpha 
Added ECO 112 — Splitting out FIX from CIX 
Added ECO 94 — IMPLVER and AMASK 
OpenVMS AXP —-> OpenVMS Alpha 
DEC OSF/1 ——> DIGITAL UNIX 
Digital —-> DIGITAL 
Moved tables D—1 and D—4 to the ALPHA_SRM notes file 


a ee 


Revision 6.0, December 12, 1994 

Created the system identification table (D—1) from the ALPHA_SRM system registry 
Updated the Processor Type Assignments 

Moved public portions of tables D—1 and D—2 to Chapter IIJ-2 (HWRPB) 

Alpha --> Alpha AXP 

Added ECO 39, PALcode Implementation Information 

Added the DSRDB SMM Number table 


Oho ge oe ee 


Revision 5.0, May 12, 1992 
1. Added XMI and Future+ tables from I/O chapter 
2. Added Jensen identifier ji 
3. Added graphics variation bit (9)\ 
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D-4 Appendixes 


Appendix E 


Waivers and Implementation-Dependent 


Functionality 


This appendix describes waivers to the Alpha architecture and functionality that is specific to 
particular hardware implementations. 


E.1 Waivers 


The following waivers have been passed for the Alpha architecture. 


EK.1.1 DECchip 21064, DECchip 21066, and DECchip 21068 IEEE Divide 
Instruction Violation 


The DECchip 21064, DECchip 21066, and DECchip 21068 CPUs violate the architected han- 
dling of IEEE divide instructions DIVS and DIVT with respect to reporting Inexact Result 
exceptions. 


Note: 


The DECchip 21064A, DECchip 21066A, and DECchip 21068A CPUs are compliant and 
require no waiver. The DECchip 21164 is also compliant. 


As specified by the architecture, floating-point exceptions generated by the CPU are recorded 
in two places for all IEEE floating-point instructions: 


1. If an exception is detected and the corresponding trap is enabled (such as ADD/U for 
underflow), the CPU initiates a trap and records the exception in the exception sum- 
mary register (EXC_SUM). 


2. The exceptions are also recorded as flags that can be tested in the floating-point control 
register (FPCR). The FPCR can only be accessed with MTPR/MFPR instructions and 
an explicit MT_FPCR is required to clear the FPCR. The FPCR is updated irrespective 
of whether the trap is enabled or not. 
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E.1.2 


E.1.3 


The DECchip 21064, DECchip 21066, and DECchip 21068 implementations differ from the 
above specification in handling the Inexact condition for the IEEE DIVS and DIVT instruc- 
tions in two ways: 


1. The DIVS and DIVT instructions with the /Inexact modifier trap unconditionally and 
report the INE exception in the EXC_SUM register (except for NaN, infinity, and 
denormal inputs that result in INVs). This allows for a software calculation to deter- 
mine the correct INE status. 


2. The FPCR <INE> bit is never set by DIVS or DIVT. This is because the DECchip 
21064, DECchip 21066, and DECchip 21068 do not include hardware to determine that 
particular exactness. . 


DECchip 21064, DECchip 21066, and DECchip 21068 Write Buffer 
Violation 


The DECchip 21064, DECchip 21066, and DECchip 21068 CPUs can be made to violate the 
architecture by, under one contrived case, indefinitely delaying a buffered off-chip write. 


Note: 


The DECchip 21064A, DECchip 21066A, and DECchip 21068A CPUs are compliant and 
require no waiver. The DECchip 21164 is also compliant. 


The CPUs in violation can send a buffered write off-chip when one of the following condi- 
tions is met: . 


1. The write buffer contains at least two valid entries. 


2. The write buffer contains one valid entry and 256 cycles have elapsed since the execu- 
tion of the last write. 


3. The write buffer contains an MB or STx_C instruction. 
4. A load miss hits an entry in the write buffer. 


The write can be delayed indefinitely under condition 2 above, when there is an indefinite 
stream of writes to addresses within the same aligned 32-byte write buffer block. 


DECchip 21264 LDx_L/STx_C with WH64 Violation 


The DECchip 21264 violates the architected relationship between the LDx_L and STx_C 
instructions when an intervening WH64 instruction is executed. 


As specified by the common architecture, in Section 4.2.4: 
If any other memory access (ECB, LDx, LDQ_U, STx, STQ_U, WH64) is executed on 


the given processor between the LDx_L and the STx_C, the sequence above may always 
fail on some implementations; hence, no useful program should do this. 
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The DECchip 21264 varies from that description, with regard to the WH64 instruction, as 
follows: 


If any other memory access (ECB, LDx, LDQ_U, STx, STQ_U) is executed on the given 
processor between the LDx_L and the STx_C, the sequence above may always fail on 
some implementations; hence, no useful program should do this. 


If a WH64 memory access is executed on any given 21264 processor between the LDx_L 
and STx_C, and: 


— The WH64 access is to the same aligned 64-byte block that STx_C is accessing, 
and 


—- NoCALL_PAL REI, rei, or rfe instruction has been executed since the most-recent 
LDx_L (ensuring that the sequence cannot occur as the result of unfortunate coin- 
cidences with interrupts) 


then, the load-locked/store-conditional sequence may sometimes fail when it would 
otherwise succeed and sometimes succeed when it otherwise would fail; hence no useful 
program should do this. 


E.2 Implementation-Specific Functionality 


E.2.1 
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The following functionality, although a documentated part of the Alpha architecture, is imple- 
mented in a manner that is specific to the particular hardware implementation. 


DECchip 21064/21066/21068 Performance Monitoring 


Note: 


All functions, arguments, and descriptions in this section apply to the DECchip 
21064/21064A, 21066/21066A, and 21068/21068A. 


PALcode instructions control the DECchip 21064/21066/21068 on-chip performance 
counters. For OpenVMS Alpha, the instruction is MTPR_PERFMON; for DIGITAL UNIX 
and Windows NT Alpha, the instruction is wrperfmon. 


The instruction arguments and results are described in the following sections. The scratch reg- 
ister usage is operating system specific. 


Two on-chip counters count events. The bit width of the counters (8, 12, or 16 bits) can be 
selected and the event that they count can be switched among a number of available events. 
One possible event is an "external" event. For example, the processor board can supply an 
event that causes the counter to increment. In this manner, off-chip events can be counted. 


The two counters can be switched independently. There is no hardware support for reading, 


writing, or resetting the counters. The only way to monitor the counters is to enable them to 
cause an interrupt on overflow. 


Waivers and Implementation-Dependent Functionality E-3 


The performance monitor functions, described in Section E.2.1.2, can provide the following, 
depending on implementation: 


¢ Enable the performance counters to interrupt and trap into the performance monitoring 
vector in the operating system. 


e Disable the performance counter from interrupting. This does not necessarily mean that 
the counters will stop counting. 


e Select which events will be monitored and set the width of the two counters. 


¢ In the case of OpenVMS Alpha and DIGITAL UNIX, implementations can choose to 
monitor selected processes. If that option is selected, the PME bit in the PCB controls 
the enabling of the counters. Since the counters cannot be read/written/reset, if more 
than one process is being monitored, the rounding error may become significant. 


E.2.1.1 DECchip 21064/21066/21068 Performance Monitor Interrupt Mechanism 


The performance monitoring interrupt mechanism varies according to the as operating 
system. 


For the OpenVMS Alpha Operating System 

When a counter overflows and interrupt enabling conditions are correct, the counter causes an 
interrupt to PALcode. The PALcode builds an appropriate stack frame. The PALcode then dis- 
patches in the form of an exception (not in the form of an interrupt) to the operating system by 
vectoring to the SCB performance monitor entry point through SCBB+650 
(HWSCB$Q_PERF_MONITOR), at IPL 29, in kernel mode. 


Two interrupts are generated if both counters overflow. For each interrupt, the status of each 
counter overflow is indicated by register R4: 


R4 = 0 if performance counter 0 caused the interrupt 
R4 = 1 if performance counter 1 caused the interrupt 


When the interrupt is taken, the PC is saved on the stack frame as the old PC. 


For the DIGITAL UNIX Operating System 
When a counter overflows and interrupt enabling conditions are correct, the counter causes an 
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operating system by vectoring to the interrupt entry point entINT, at IPL 6, in kernel mode. 


Two interrupts are generated if both counters overflow. For each interrupt, registers a0..a2 are 
as follows: 


a0 = osfint$c_perf (4) 

al = scb$v_perfmon (650) 
a2 = 0 if performance counter 0 caused the interrupt 
a2 = 1 if performance counter 1 caused the interrupt 
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For the Windows NT Alpha Operating System 


When a counter overflows and interrupt enabling conditions are correct, the counter causes an 
interrupt to PALcode. The PALcode builds a frame on the kernel stack and dispatches to the 
Kernel at the interrupt entry point. 
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E.2.1.2 Functions and Arguments for the DECchip 21064/21066/21068 


The functions execute on a single (the current running) processor only and are described in 


Table E-1. 


¢ The OpenVMS Alpha MTPR_PERFMON instruction is called with a function code in 
R16, a function-specific argument in R17, and status is returned in RO. 

¢ The DIGITAL UNIX wrperfmon instruction is called with a function code in a0, a func- 
tion specific argument in al, and status is returned in vO. 

¢ The Windows NT Alpha wrperfmon instruction is called with input parameters a0 


through a3, as shown in Table E-1. 


Table E-1: DECchip 21064/21066/21068 Performance Monitoring Functions 


Function Register Usage 





Enable performance monitoring 


Comments 


Enable takes effect at the next IPL change 








DIGITAL UNIX 
Input: a0 = 1 
al =0 
Output: vO=1 
vO =0 


OpenVMS Alpha 
Input: R16=1 


R17=0 
Output: RO=1 
RO =0 
Windows NT Alpha 
Input: a0 = 0 
a0 = 1 
al=1 


Disable performance monitoring 





DIGITAL UNIX 
Input: a0 = 0 
al =0 
Output: vO=1 
v0 =0 


OpenVMS Alpha 
Input: R16=0 


R17=0 
Output: RO=1 
RO=0 
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Function code 
Argument 

Success 

Failure (not generated) 


Function code 
Argument 

Success 

Failure (not generated) 


Select counter 0 


Select counter 1 
Enable selected counter 


Disable takes effect at the next IPL change 





Function code 
Argument 

Success . 
Failure (not generated) 


Function code 
Argument 

Success 

Failure (not generated) 
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Table E-1: DECchip 21064/21066/21068 Performance Monitoring Functions 


(Continued) 


Function Register Usage 


Windows NT Alpha 
Input: a0 = 0 
a0 = 1 
al =0 


Select desired events (mux_ctl) 





DIGITAL UNIX 
Input: a0 = 2 
al = mux_ctl 
Output: v0=1 
vO = 0 
OpenVMS Alpha 
Input: R16=2 
R17 = mux_ctl 
Output: RO=1 
RO =0 
Windows NT Alpha 
Input: a2 = PCMUX0 
a2 = PCMUX1 
a3 = PCO 
a3 = PCl 


Comments 


Select counter 0 
Select counter 1 
Disable selected counter 





Function code 

mux_ctl is the exact contents of those fields 
from the ICCSR register, in write format, 
described in Table E-2. 


Success 
Failure (not generated) 


Function code 

mux_ctl is the exact contents of those fields 
from the ICCSR register, in write format, 
described in Table E-2. 


Success 
Failure (not generated) 


For ICCSR<PCMUXO0> field when a0 = 0 
For ICCSR<PCMUX1> field when a0 = 1 
For ICCSR<PCO> field when a0 = 0 
For ICCSR<PC1> field when a0 = 1 


Select performance monitoring options 





DIGITAL UNIX 
Input: a0 = 3 
al = opt 
Output: v0=1 
v0 =0 
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Function code 


Function argument opt is: 
<0> = log all processes if set 
<1> = log only selected if set 


Success 
Failure (not generated) 
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Table E-1: DECchip 21064/21066/21068 Performance Monitoring Functions 
(Continued) 


Function Register Usage Comments 


OpenVMS Alpha 
Input: R16 =3 Function code 


R17 = opt Function argument opt is: 
<0> = log all processes if set 
<1> = log only selected if set 
Output: RO=1 Success 


RO =0 Failure (not generated) 


Table E-2: DECchip 21064/21066/21068 MUX Control Fields in ICCSR Register 


Bits Option Description 
34:32 PCMUX1_ Event selection, counter 1: 


Value Description 


Total D-cache misses 

Total I-cache misses 

Cycles of dual issue 

Branch mispredicts (conditional, JSR, HW_REI) 

FP operate instructions (not BR, LOAD, STORE) 
Integer operates (including LDA, LDAH into RO-R30) 
Total store instructions 


ANNA ff WN KF © 


External events supplied by pin 
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Table E-2: DECchip 21064/21066/21068 MUX Control Fields in ICCSR Register 
(Continued) 


Bits Option Description 
11:8 PCMUX0 _ Event selection, counter 0: 


Value Description 
0 Total issues divided by 2 
1 Unused 
2 Nothing issued, no valid I-stream data 
‘| Unused 
4 All load instructions 
5 Unused 
6 Nothing issued, resource conflict 
7 Unused 
8 All branches (conditional, unconditional, JSR, HW_REI) 
9 Unused 
10 Total cycles 
1] Cycles while in PALcode environment 
12 Total nonissues divided by 2 
13 Unused 
14 External event supplied by pin. 
15 Unused 
3 PCO ' Frequency setting, counter 0: 
Value Description 
0 2**16 (65536) events per interrupt 
1 2**12 (4096) events per interrupt 
0 PCI Frequency setting, counter 1: 
Value Description 
0 2**12 (4096) events per interrupt 
1 2**8 (256) events per interrupt 
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E.2.2 DECchip 21164/21164PC Performance Monitoring 


Unless otherwise stated, the term "21164" in this section means implementations of the 21164 
at all frequencies. 


PALcode instructions control the DECchip 21164/21164PC on-chip performance counters. 
For OpenVMS Alpha, the instruction is MTPR_PERFMON; for DIGITAL UNIX and Win- 
dows NT Alpha, the instruction is wrperfmon. 

The instruction arguments and results are described in the following sections. The scratch reg- 
ister usage is operating system specific. 


Three on-chip counters count events. Counters 0 and 1 are 16-bit counters; counter 2 is a 
14-bit counter. Each counter can be individually programmed. Counters can be read and writ- 
ten and are not required to interrupt. The counters can be collectively restricted according to 
the processor mode. 


Processes can be selectively monitored with the PME bit. 


E.2.2.1 Performance Monitor Interrupt Mechanism 


The performance monitoring interrupt mechanism varies according to the particular operating 
system. 


For the OpenVMS Alpha Operating System 

When a counter overflows and interrupt enabling conditions are correct, the counter causes an 
interrupt to PALcode. The PALcode builds an appropriate stack frame. The PALcode then dis- 
patches in the form of an exception (not in the form of an interrupt) to the operating system by 
vectoring to the SCB performance monitor entry point through SCBB+650 
(HWSCB$Q_PERF_ MONITOR), at IPL 29, in kernel mode. 


An interrupt is generated for each counter overflow. For each interrupt, the status of each 
counter overflow is indicated by register R4: 


R4=0if performance counter 0 caused the interrupt 
R4 = 1 if performance counter 1 caused the interrupt 
R4 = 2 if performance counter 2 caused the interrupt 


When the interrupt is taken, the PC is saved on the stack frame as the old PC. 
For the DIGITAL UNIX Operating System 
When a counter overflows and interrupt enabling conditions are correct, the counter causes an 
interrupt to PALcode. The PALcode builds an appropriate stack frame and dispatches to the 
operating system by vectoring to the interrupt entry point entINT, at IPL 6, in kernel mode. 
An interrupt is generated for each counter overflow. For each interrupt, registers a0..a2 are as 
follows: 

a0 = osfint$c_perf (4) 

al = scb$v_perfmon (650) 

a2 = 0 if performance counter 0 caused the interrupt © 

a2 = 1 if performance counter | caused the interrupt 
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For the Windows NT Alpha Operating System 
When a counter overflows and interrupt enabling conditions are correct, the counter causes an 
interrupt to PALcode. The PALcode builds a frame on the kernel stack and dispatches to the 
Kernel at the interrupt entry point. 


E.2.2.2 Windows NT Alpha Functions and Argument 


The functions for Windows NT Alpha execute on only a single (the current running) processor. 
The wrperfmon instruction is called with the following input registers: 


Input 
Register 


a0 


al 


a2 


a3 


E-10 Appendixes 


Contents 
(Bits) 


63-0 


0 


Meaning 


The register in Table E—3, which contains the value to be written 
to the hardware PMCTR register. 


When al = 0, write a0 to the hardware PMCTR register. 


When al = 1, read the hardware PMCTR register. The returned 
PMCTR register is written to register vO. 


Has meaning when PCSEL1 in Table E—3 has the value OxF. Con- 
tents are determined by processor type: 


Processor Contents Reference 
21164 CBOX1 Table E—15 
21164PC PM0O_MUX Table E-17 


Has meaning when PCSEL2 in Table E—3 has the value OxF. Con- 
tents are determined by processor type: 
Processor Contents Reference 


21164 CBOX2 Table E—16 
21164PC PM1_MUX Table E-18 
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Table E-3: Bit Summary of PMCTR Register for Windows NT Alpha 


Bits Name 


63-48 CTRO 
47-32 CTRI1 

31 PCSELO 
30 

29-16 CTR2 
15-14 CTLO 
13-12 CTLI 
11-10 CTL2 


Meaning 
Counter 0 value 
Counter 1 value 


Counter 0 selection: 


Value Meaning 
0 Cycles 
1 Issues 


Must be set to one! 


Counter 2 value 


Counter 0 control: 


Value Meaning 

0 Counter disable, interrupt disable 

1 Counter enable, interrupt disable 

2 Counter enable, interrupt at count 65536 
3 Counter enable, interrupt at count 256 


Counter 1 control: 
Value Meaning 


0 Counter disable, interrupt disable 

1 Counter enable, interrupt disable 

2 Counter enable, interrupt at count 65536 
3 Counter enable, interrupt at count 256 


Counter 2 control: 


Value Meaning 

0 Counter disable, interrupt disable 

1 Counter enable, interrupt disable 

2 Counter enable, interrupt at count 16384 
3 Counter enable, interrupt at count 256 


DIGITAL Restricted Distribution 


Waivers and Implementation-Dependent Functionality E-11 


Table E-3: Bit Summary of PMCTR Register for Windows NT Alpha 
(Continued) 


Bits Name Meaning 


9-8 MODE. SELECT! Select modes in which to count: 


Value Meaning 
0 Count all modes 
1 Count PALmode only 
2 Count all modes except PALmode 
3 Count only user mode 
7-4 PCSEL1 Counter 1 selection. See Table E—13 
3-0 PCSEL2 Counter 2 selection. See Table E—14 


1 Windows NT Alnha uses bits 30 and 9-8 differently than as documented in the 21164 Hard- 
ware Reference Manual; it uses the processor executive mode to run user (nonprivileged) 
code. Therefore, bit 30 is always set to one and bits 9-8 are used to select the mode. 


E.2.2.3 OpenVMS Alpha and DIGITAL UNIX Functions and Arguments 


The functions execute only on a single (the current running) processor and are described in 
Table E-4. 


The OpenVMS Alpha MTPR_PERFMON instruction is called with a function code in R16, a 
function-specific argument in R17, and status is returned in RO. 


The DIGITAL UNIX wrperfmon instruction is called with a function code in a0, a function 
specific argument in al, and status is returned in vO. 


Table E-4: OpenVMS Alpha and DIGITAL UNIX Performance Monitoring 
Functions 


Function Register Usage Comments 


Enable performance monitoring; do not reset counters 








DIGITAL UNIX 
Input: a0 = 1 Function code value 
al = arg Argument from Table E—-5 
Output: v0 = 1 Success 
v0 =0 Failure (not generated) 
OpenVMS Alpha 
Input: R16=1 Function code value 
R17 = arg Argument from Table E—5 
Output: RO=1 Success 
RO=0 Failure (not generated) 
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Table E-4: OpenVMS Alpha and DIGITAL UNIX Performance Monitoring 


Functions (Continued) 


Function Register Usage 


Enable performance monitoring; start the counters from zero 


Comments 





DIGITAL UNIX 
Input: a0=7 
al = arg 
Output: vO = 1 
v0 =0 
OpenVMS Alpha 
Input: R16=7 
R17 = arg. 
Output: RO= 1 
RO=0 


Function code value 
Argument from Table E-5 
Success . 
Failure (not generated) 


Function code value 
Argument from Table E-5 
Success 

Failure (not generated) 








Disable performance monitoring; do not reset counters 





DIGITAL UNIX 
Input: a0 = 0 
al = arg 
Output: v0O=1 
v0 = 0 
OpenVMS Alpha 
Input: R16=0 
R17 = arg 
Output: RO=1 
RO=0 





Function code value 
Argument from Table E-6 
Success 

Failure (not generated) 


Function code value 
Argument from Table E—-6 
Success 

Failure (not generated) 








Select desired events (MUX_SELECT) 





DIGITAL UNIX 
Input: a0 =2 
al = arg 
Output: v0O=1 
v0 =0 
OpenVMS Alpha 
Input: R16=2 
R17 = arg 
Output: RO= 1 
RO=0 
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Function code value 

Argument from Table E—-7 or E-8 
Success 

Failure (not generated) 


Function code value 

Argument from Table E—-7 or E-8 
Success 

Failure (not generated) 
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Table E-4: OpenVMS Alpha and DIGITAL UNIX Performance Monitoring 


Functions (Continued) 


Function Register Usage 


Select Processor Mode options 


DIGITAL UNIX 
Input: a0 = 3 
al = arg 
Output: v0 = 1 
» [  vwo=0 
OpenVMS Alpha 
| Input: R16 =3 
R17 = arg 
Output: RO=1 
RO =0 





Select interrupt frequencies 





Comments 





Function code value 
Argument from Table E—9 
Success 

Failure (not generated) 


Function code value 
Argument from Table E—-9 
Success 

Failure (not generated) 








DIGITAL UNIX 
Input: a0 = 4 
al = arg 
Output: vO0O=1 
vO = 0 
OpenVMS Alpha 
Input: R16=4 
R17 = arg 
Output: RO=1 
RO=0 
Read the counters 
DIGITAL UNIX 
Input: a0 =5 
al = arg 
Output: vO = val 
OpenVMS Alpha 
Input: R16=5 
R17 = arg 
Output: RO = val 
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Function code value 
Argument from Table E—10 
Success 

Failure (not generated) 


Function code value 
Argument from Table E—10 
Success 

Failure (not generated) 








Function code value 
Argument from Table E-11 
Return value from Table E—11 


Function code value 


Argument from Table E—11 
Return value from Table E-11 


DIGITAL Restricted Distribution 


Table E-4: OpenVMS Alpha and DIGITAL UNIX Performance Monitoring 
Functions (Continued) 


Function Register Usage Comments 


Write the counters 





DIGITAL UNIX 
Input: a0 = 6 Function code value 
al = arg Argument from Table E-12 
Output: v0 = 1 Success 
v0 =0 Failure (not generated) 
OpenVMS Alpha 
Input: R16=6 Function code value 
R17 = arg Argument from Table E—12 
Output: RO=1 Success 
RO=0 Failure (not generated) 


Table E-5: 21164/21164PC Enable Counters for OpenVMS Alpha and DIGITAL 
UNIX 


Bits Meaning When Set 


2 Operate on counter 2 
1 Operate on counter 1 
0 Operate on counter 0 


Table E-6: 21164/21164PC Disable Counters for OpenVMS Alpha and DIGITAL 
UNIX 


Bits Meaning When Set 


2 Operate on counter 2 
1 Operate on counter 1 
0 Operate on counter 0 
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Table E-7: 21164 Select Desired Events for OpenVMS Alpha and DIGITAL 


UNIX 

Bits Name Meaning 

63:32 MBZ 

31 PCSELO Counter 0 selection: 
Value Meaning 
0 Cycles 
1 Issues 

30:25 MBZ 

24:22 CBOX2 CBOX2 event selection (only has meaning when event selection field 
PCSEL2 is value <15>; otherwise MBZ). CBOX2 described in Table E— 
16. 

21:19 CBOX1 CBOX1 event selection (only has meaning when event selection field 
PCSEL 1] is value <15>; otherwise MBZ). CBOX1 described in Table E— 
15. 

18:8 MBZ 


7:4 PCSEL1 
3:0 PCSEL2 


Counter 1 event selection. PCSELI1 described in Table E—13. 
Counter 2 event selection. PCSEL2 described in Table E-14. 


Table E-8: 21164PC Select Desired Events for OpenVMS Alpha and DIGITAL 
UNIX 


Bits Name 


42-29 
VIL 


. 


31 PCSELO 


30:14 
13:11 PM1i_MUx 


10:8 PMO_MUX 
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Meaning 


NMIRT 
IIL 


AVE 


Counter 0 selection: 


Value Meaning 

0 Cycles 

1 Issues 
MBZ 


PM1_MUX event selection (only has meaning when event selec- 
tion field PCSEL2 is value <15>; otherwise MBZ). PM1_MUxX is 
described in Table E-18. 


PMO_MUX event selection (only has meaning when event selec- 
tion field PCSEL1 is value <15>; otherwise MBZ). PMO_MUX is 
described in Table E-17. 
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Table E-8: 21164PC Select Desired Events for OpenVMS Alpha and DIGITAL 


Bits 
7:4 
3:0 


UNIX (Continued) 
Name Meaning 
PCSEL1 Counter 1 event selection. PCSEL1 described in Table E—13. 
PCSEL2 Counter 2 event selection. PCSEL2 described in Table E—14. 


Table E-9: 21164/21164PC Select Special Options for OpenVMS Alpha and 


Bits 
63:31 
30 | 
29:10 
9 

8 

ial 

0 


DIGITAL UNIX 


Meaning 

MBZ 

Stop count in user mode 
MBZ 

Stop count in PALmode 
Stop count in kernel mode 
MBZ 


Monitor selected processes (when clear monitor all processes) 


Setting any of the "NOT" bits causes the counters to not count when the processor is running 
in the specified mode. Under OpenVMS Alpha, "NOT_KERNEL'" also stops the count in exec- 
utive and supervisor mode, except as noted below: | 


NOT_BITS Counters Operate Under These Modes When Bits Set: 


U 


ee See Ge ee Ss AE 1 


— 


Note: 


ee =e CO OES lULréaTS- Oo O&O 


Pp 

0 K ES UP 

1 K ES U 

0 KES P 

] K ES 

0 UP 

1 U 

0 P 

1 ES (here "NOT_KERNEL" stops kernel counter only) 


DIGITAL UNIX counts user mode by using the executive counter; that is, the count for 
executive mode is returned as the user mode count. 
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Table E-10: 21164/21164PC Select Desired Frequencies for OpenVMS Alpha and 
DIGITAL UNIX 


Table E—10 contains the selection definitions for each of the three counters. All frequency 
fields are two-bit fields with the following values defined: 


Bits Meaning When Set 


63:10 MBZ 


9:8 Counter 0 frequency: 


Value 
0 


1 
Zz 
3 


Meaning 

Do not interrupt 

Unused 

Low frequency (2**16 (65536) events per interrupt) 
High frequency (2**8 (256) events per interrupt) 


7:6 Counter 1 frequency: 


Value 
0 


] 
oy, 
3 


Meaning 

Do not interrupt 

Unused | 

Low frequency (2**16 (65536) events per interrupt) 
High frequency (2**8 (256) events per interrupt) 


5:4 Counter 2 frequency: 


Value 
0 


1 
pe 
3 


3:0 MBZ 
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Meaning 

Do not interrupt 

Unused 

Low frequency (2**14 (16384) events per interrupt) 
High frequency (2**8 (256) events per interrupt) 
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Table E-11; 21164/21164PC Read Counters for OpenVMS Alpha and DIGITAL 


Bits 
63:48 
47:32 
31:30 
29:16 
15:1 
0 


UNIX 


Meaning When Returned 


Counter 0 returned value 
Counter 1 returned value 
MBZ 
Counter 2 returned value 
MBZ 


Set means success; clear means failure 


Table E-12: 21164/21164PC Write Counters for OpenVMS Alpha and DIGITAL 


Bits 
63:48 
A732 
31:30 
29:16 
15:0 


UNIX 


Meaning 

Counter 0 written value 
Counter | written value 
MBZ 

Counter 2 written value 


MBZ 


Table E-13: 21164/21164PC Counter 1 (PCSEL1) Event Selection 


The following values choose the counter 1 (PCSEL1) event selection: 


Value 


aoa nyt Nn NA HP WO NYO — CO 


Meaning 


Nothing issued, pipeline frozen 
Some but not all issuable instructions issued 
Nothing issued, pipeline dry 

Replay traps (ldu, wb/maf, litmus test) 

Single issue cycles 

Dual issue cycles 

Triple issue cycles 

Quad issue cycles 


Flow change (all branches, jsr-ret, hw_rei), where: 
If PCSEL2 has value 3, flow change is a conditional branch 
If PCSEL2 has value 2, flow change is a JSR-RET 
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Table E-13: 21164/21164PC Counter 1 (PCSEL1) Event Selection (Continued) 


The following values choose the counter 1 (PCSEL1) event selection: 


Value Meaning 


9 Integer operate instructions 

10 Floating point operate instructions 

11 Load instructions 

12 Store instructions 

13 Instruction cache access 

14 Data cache access 

15 For the 21164, use CBOX1 event selection in Table E-15. 


For the 21164PC, use PMO_MUX event selection in Table E-17. 


Table E-14: 21164/21164PC Counter 2 (PCSEL2) Event Selection 


The following values choose the counter 2 (PCSEL2) event selection: 


Value Meaning 


Long stalls (> 15 cycles) 
Unused value 

PC mispredicts 
Branch mispredicts 
I-cache misses 

ITB misses 

D-cache misses 

DTB misses 

Loads merged in MAF 
LDU replays 
WB/MAF full replays 


0 wa nnn Ff WO NY YF CO 


—_—_ —_ 
= © 


Event from external pin 


— 
NO 


Cycles 


— 
WW 


Memory barrier instructions 
LDx/L instructions 


For the 21164, use CBOX2 event selection in Table E—16. 
For the 21164PC, use PM1_MUX event selection in Table E~18. 


a 
-f. 


— 
N 
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Table E-15: 21164 CBOX1 Event Selection 


The following values choose the CBOX1 event selection. 


Value Meaning 
S-cache access 
S-cache read 
S-cache write 
S-cache victim 
Unused value 
B-cache hit 


B-cache victim 


NI HD NO F&F WY NY KF O&O 


System request 


Table E-16: 21164 CBOX2 Event Selection 


The following values choose the CBOX2 event selection. 


Value Meaning 

0 S-cache misses 

1 S-cache read misses 
2 S-cache write misses 
6) S-cache shared writes 
4 S-cache writes 

5 B-cache misses _ 

6 System invalidates 

f System read requests 
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Table E-17: 21164PC PM0_MUX Event Selection 


The following values choose the PMO_MUX event selection and perform the chosen operation 
in Counter 0. . 


Value Meaning 

0 B-cache read operations 
1 B-cache D read hits 

2 B-cache D read fills 

3 B-cache write operations 
4 Undefined 

5 B-cache clean write hits 
6 B-cache victims 

7 Read miss 2 launched 


Table E-18: 21164PC PM1_ MUX Event Selection 


The following values choose the PM1_MUX event selection and perform the chosen operation 
in Counter 1. 


Value Meaning 

0 B-cache D read operations 

1 B-cache read hits 

2 B-cache read fills 

3 B-cache write hits 

4 B-cache write fills 

5 System read/flush B-cache hits 

6 System read/flush B-cache misses 
7 Read miss 3 launched 
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E.3 \Revision History 


Revision 7.0, November10, 1997 

Alpha AXP —> Alpha 

DEC OSF/1 ——> DIGITAL UNIX 

Windows NT AXP —> Windows NT Alpha 
Digital —> DIGITAL 

Added information for EV56 and PCA56 
Removed bit names 

Added PCMCTR for Windows NT ALpha 


as OY ae ey Oe 


Revision 6.0, December 12, 1994 

Added EV5 information 

Added Windows NT AXP wrperfmon instruction information 
Added ECO 50, Performance monitor OSF/1 and EV4 Usage 
Created as Appendix F 

Added Waiver 2, 3, 4\ 


oo oe Se 
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A 


Aligned byte/word memory accesses, A-11 


Alignment 
data considerations, A-6 
double-width data paths, A-1 
instruction, A—2 
memory accesses, A-11 
AMASK bit assignments, D-3 


Arithmetic traps 
denormal operand exception enabled for, B-6 
denormal operand status of, B-—5 
division by zero, enabling, B-6 
division by zero, status of, B-—-5 
enabling, B-5 
inexact result, enabling, B-6 
inexact result, status of, B—-5 
integer overflow, disabling, B-5 
integer overflow, enabling, B-5 
invalid operation, enabling, B-6 
invalid operation, status of, B-5 
overflow, enabling, B-6 
overflow, status of, B-—5 
underflow, enabling, B-6 
underflow, status of, B—5 


ASCII character set, C—22 


Atomic sequences, A-18 


Big-endian addressing 

byte swapping for, A—12 
Boolean stylized code forms, A-15 
Branch instructions 

opcodes and format summarized, C-—1 
Byte swapping, A-—12 


Cc 


Caches 
design considerations, A-1 
I-stream considerations, A—5 
translation buffer conflicts, A—8 


Appendix Index 


Clear a register, A-13 


Code forms, stylized, A-13 
Boolean, A-15 
load literal, A-14 
negate, A-15 
NOP, A-13 
NOT, A-15 
register, clear, A—13 
register-to-register move, A-—14 
Code sequences, A-—10 
CVTQL instruction, FP_C quadword with, B-5 


CVTTQ instruction, FP_C quadword with, B-5 


D 


Data alignment, A-6 
Data sharing (multiprocessor), A-7 - 
Data stream considerations, A-—6 
Denormal operand exception enable (DNOE) 
FP_C quadword bit, B-6 
Denormal operand status (DNOS) 
FP_C quadword bit, B-5 
DIGITAL UNIX PALcode, instruction summary, 
C-16 
Division 
integer, A—12 
performance impact of, A-12 
Division by zero enable (DZEE) 
FP_C quadword bit, B-6 
Division by zero status (DZES) 
FP_C quadword bit, B-5 


E 


Exception handlers, B-3 


F 


Floating-point division, performance impact of, 
A-12 
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Floating-point instructions 

opcodes and format summarized, C-—1 S 
Floating-point support 

floating-point control (FP_C) quadword, B—4 
FNOP code form, A—13 


FP_C quadword, B-4 


Function codes 


IEEE floating-point, C-6 

in numerical order, C—10 
independent floating-point, C-8 
VAX floating-point, C-7 

See also Opcodes 


IEEE floating-point 


exception handlers, B-3 
floating-point control (FP_C) quadword, B-—4 


hardware sunnort, RB-2 

options, B-1 

standard charts, B-12 

standard, mapping to, B-6 

trap handling, B-6 
IEEE floating-point control word, B—4 
IEEE floating-point instructions 

function codes for, C—6 
IEEE standard 

conformance to, B-—1 

mapping to, B-6 
IMPLVER value assignments, D-3 
Independent floating-point function codes, C-8 
Inexact result enable (INEE) 

FP_C quadword bit, B-6 
Inexact result status (INES) 

FP_C quadword bit, B-—5 
Instruction encodings 


common architecture, C-—1 
numerical order, C—10 
opcodes and format summarized, C-1 


Instruction stream. See I-stream 

Integer division, A-—12 

Invalid operation enable (INVE) 
FP_C quadword bit, B-6 

Invalid operation status (INVS) 
FP_C quadword bit, B-—5 

I-stream, design considerations, A-—2 


L 


Load literal, A-14 


Memory access, aligned byte/word, A-—11 


Memory format instructions 

opcodes and format summarized, C-—1 
Move, register-to-register, A—14 
Multiple instruction issue, A-3 
Multiprocessor environment, shared data, A-—7 


N 


Negate stylized code form, A-15 
NOP, universal (UNOP), A—13 
NOT stylized code form, A-—15 


O 


Opcodes 
common architecture, C-—1 
DIGITAL UNIX PALcode, C-16 
in numerical order, C—10 
OpenVMS Alpha PALcode, C-—14 
PALcode in numerical order, C-—18 
reserved, C-—21 
summary, C-8 
unused function codes for, C—21 
Windows NT Alpha PALcode, C-17 
See also Function codes 

OpenVMS Alpha PALcode, instruction summary, 

C-14 

Operate instructions 

opcodes and format summarized, C-—1 


Optimization. See Performance optimizations 


Overflow enable (OVFE) 
FP_C quadword bit, B-6 
Overflow status (OVES) 


FP_C quadword bit, B-5 


p 


PALcode instructions 
opcodes and format summarized, C-1 
required, C-—20 
reserved, function codes for, C—20 
PALcode opcodes in numerical order, C-—18 


PALcode variation assignments, D-2 
Performance monitoring, E-3, E-9 


Performance optimizations 


ormance optimizatio 
branch prediction, A-3 
code sequences, A-10 
data stream, A-6 

for I-streams, A-—2 
instruction alignment, A—2 
instruction scheduling, A-—5 
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I-stream density, A-—S 
multiple instruction issue, A-3 
shared data, A-—7 


Processor type assignments, D-1 
Pseudo-ops, A-16 


R 


Read/write, sequential, A-—9 . 
Register-to-register move, A-14 
Reserved instructions, opcodes for, C-21 
Result latency, A-—5 


S 


Sequential read/write, A-—9 





Shared data (multiprocessor), A-7 


Software considerations, A-1 
See also Performance optimizations 


T 


Timing considerations, atomic sequences, A-—18 


Trap disable bits 
IEEE compliance and, B—4 
Trap enable bits, B-—5 


Trap handling, IEEE floating-point, B-6 
TRAPB (trap barrier) instruction, A-15 


U 


Underflow enable (UNFE) 
FP_C quadword bit, B—6 

Underflow status (UNFS) 
FP_C quadword bit, B-5 

UNOP code form, A-13 


V 


VAX floating-point instructions 
function codes for, C-—7 


W 


Waivers, E-1 


Windows NT Alpha PALcode, instruction summary, 
C-17 
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| | Master Index’ 


A 


Aborts, forcing, (I) 6-6 
Absolute longword queue, (II-A) 2—21 
Absolute quadword queue, (I-A) 2—24 


Access control violation (ACV) fault, (II-A) 6-10 


has precedence, (II-A) 3—13 
memory protection, (II-A) 3-8 
service routine entry point, (II-A) 6-27 


Access violation fault, (II-B) 3-12, (II-C) 4-3 
ACCESS(x,y) operator, (I) 3-7 


Add instructions 
add longword, (I) 4-25 
add quadword, (I) 4—27 
add scaled longword, (I) 4—26 
add scaled quadword, (I) 4—28 
See also Floating-point operate 


ADDF instruction, (I) 4-110 
ADDG instruction, (I) 4-110 
ADDL instruction, (1) 4-25 
ADDQ instruction, (I) 4-27 
Address space, (II-C) 3-1 


Address space match (ASM) 

bit in PTE, (I-A) 3-5, (HI-B) 3-5, (II-C) 3-5 

TBIAP register uses, (II-A) 5-25 

virtual cache coherency, (I) 5-4 

with context switch, (II-C) 2-10, (II-C) 5-35 
Address space number (ASN) register, (II-A) 5-4, 

(I-C) 2-4 

at processor initialization, (III) 3-20 

defined, (II-B) 1-2 

described, (II-B) 3-11 

in HWPCB, (I-A) 4-2 

in initial HWPCB, (III) 3-21 

in process context, (II-B) 4-1 

privileged context, (II-A) 2-91 

range supported, (II-A) 3-12 





TBCHK register uses, (II-A) 5—23 

TBIS register uses, (I-A) 5—26 

translation buffer with, (II-A) 3-11 

virtual cache coherency, (I) 5-4 

with context switch, (II-C) 2-10 

with PALcode switching, (III) 3-8 
Address translation 

algorithm to perform, (II-A) 3-9 

page frame number (PFN), (II-A) 3-8 

. page table structure, (II-A) 3-8, (II-C) 3-2 

performance enhancements, (I-A) 3-10 

physical, (II-B) 3-7 

translation buffer with, (II-A) 3-11 

virtual, (II-B) 3-9 

virtual address segment fields, (II-A) 3-8 
ADDS instruction, (1) 4-111 


ADDT instruction, (I) 4-111 

AFTER, defined for memory access, (I) 5-12 
Aligned byte/word memory accesses, A-11 
ALIGNED data objects, (I) 1-8 


Alignment 
atomic byte, (I) 5-3 
atomic longword, (I) 5-2 
atomic quadword, (I) 5-2 
D_floating, (I) 2-6 
data alignment trap, (I-A) 6—15 
data considerations, A-—6 
double-width data paths, A-1 
F_floating, (I) 2-4 
G_floating, (1) 2-5 
instruction, A—2 
longword, (I) 2-2 
longword integer, (I) 2-12 
memory accesses, A—11 
program counter (PC), (II-A) 6-6 
quadword, (I) 2-3 
quadword integer, (I) 2-12 
S_floating, (I) 2-8 
stack, (II-A) 6-31 
T_floating, (I) 2-9 . 
when data is unaligned, (I-A) 6-28 


+. Each section of this manual includes an index of only that section, and following this Master Index is 
an index of the Alpha instruction set and the PALcode instructions for each documented operating sys- 


tem. 


DIGITAL Restricted Distribution 


Master Index-—1 


X_floating, (I) 2-10 
Alpha architecture _ 
addressing, (I) 2-1 
overview, (I) 1-1 
porting operating systems to, (1) 1-1 
programming implications, (I) 5-1 
registers, (I) 3-1 
security,. (I) 1-7 
See also Conventions 
Alpha privileged architecture library. See PALcode 


AMASK (Architecture mask) instruction, (I) 4-134 
AMASK bit assignments, D-3 
AMOVRM (PALcode) instruction, (II-A) 2—75 
AMOVRR (PALcode) instruction, (IJ-A) 2—75 
AND instruction, (I) 4—42 
AND operator, (1) 3-7 
APC_LEVEL, IRQL table index name, (II-C) 2-2 
ARC Restart Block, (II-C) 5—23 
Architecture extensions, AMASK with, (I) 4-134 
ARITH_RIGHT_SHIFT(x,y) operator, (I) 3-7 
Arithmetic exceptions, (II-C) 4-5 

See also Arithmetic traps 
Arithmetic instructions, (I) 4-24 

See also specific arithmetic instructions 
Arithmetic left shift instruction, (I) 441 


Arithmetic trap entry (entArith) register, (II-B) 1-2, 
(II-B) 5-4 
Arithmetic traps, (II-C) 4-5 

denormal operand exception disabling, (I) 4-81 

denormal operand exception enabled for, B-6 

denormal operand status of, B-5 

described, (II-A) 6-12 

disabling, (I) 4-78 

division by zero, (I) 4-77, (I) 4-81, (I-A) 
6-14, (II-B) 5—6, (II-C) 4-6 

division by zcro, disabling, (I) 4-81 

division by zero, enabling, B-6 

division by zero, status of, B—5 

dynamic rounding mode, (I) 4—80 

enabling, B-5 

F31 as destination, (II-A) 6-12 

inexact result, (I) 4-78, (I) 4-81, (II-A) 6-14, 
(II-B) 5-5, (II-C) 4-6 

inexact result, disabling, (I) 4-80 

inexact result, enabling, B—6 

inexact result, status of, B—5 

integer overflow, (I) 4-78, (I) 4—81, (I-A) 
6-15, (II-B) 5-5, (II-C) 4-6 

integer overflow, disabling, B-—5 

integer overfiow, enabling, B-5 

invalid operation, (I) 4-76, (I) 4-81, (I-A) 
6-14, (II-B) 5—6, (II-C) 4-6 

invalid operation, disabling, (1) 4-81 

invalid operation, enabling, B-6 

invalid operation, status of, B-—5 


Master Index—2 


overflow, (I) 4-77, (1) 4-81, (I-A) 6-14, (II-B) 
_ 5-5, CI-C) 4-6 

overflow, disabling, (I) 4-81 

overflow, enabling, B-6 

overflow, status of, B-—5 

program counter (PC) value, (II-A) 6-14 

programming implications for, (I) 5-30 

R31 as destination, (II-A) 6-12 

recorded for software, (II-A) 6-12 

REI instruction with, (IIJ-A) 6-9 

service routine entry point, (II-A) 6-27 

system entry for, (II-B) 5—4 

TRAPB instruction with, (I) 4-146 

underflow, (I) 4-78, (I) 4-81, (II-A) 6-14, 

(II-B) 5-5, (II-C) 4-6 

underflow to zero, disabling, (1) 4—80 

underflow, disabling, (I) 4-80 

underflow, enabling, B—6 

underflow, status of, B-—5 

when concurrent with data alignment, (II-A) 


when registers affected hy, (IT-A) 6-13 
ASCII character set, C—22 
ASN_wrap_indicator, (II-C) 2-10 


AST enable (ASTEN) register 
at processor initialization, (III) 3-20 
changing access modes in, (II-A) 4—4 
described, (II-A) 5-5 
in HWPCB, (I-A) 4-2 
in initial HWPCB, (IID 3-21 
interrupt arbitration, (II-A) 6-35 
operation (with ASTs), (II-A) 4-4 
privileged context, (II-A) 2—91 
SWASTEN instruction with, (II-A) 2-19 
AST summary (ASTSR) register 
at processor initialization, (II) 3-20 
described, (II-A) 5-7 
in HWPCB, (I-A) 4-2 
in initial HWPCB, (III) 3-21 
indicates pending ASTs, (II-A) 4—4 
interrupt arbitration, (II-A) 6-35 
privileged context, (I-A) 2-91 
Asynchronous procedure call (APC) 
SIRR register field for, (II-C) 4-16 
Asynchronous system traps (AST) 
ASTEN/ASTSR registers with, (II-A) 4-4 
initiating Process 
context switching the, (II-A) 4—4 
interrupt, defined, (II-A) 6-20 
service routine entry point, (II-A) 6-27 
with PS register, (II-A) 44 
Atomic access, (I) 5—3 
Atomic move operations, (IJ-A) 2-74 


Atomic operations 


\tomic operatio 
accessing longword datum, (I) 5-2 
accessing quadword datum, (I) 5-2 
modifying page table entry, (II-A) 3-6 
updating shared data structures, (I) 5-7 
using load locked and store conditional, (I) 5-7 
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Atomic sequences, A-18 


AUTO_ACTION environment variable, (III) 2-27 
overriding, (III) 3-27 
state transitions and, (III) 3-1 
with cold bootstrap, (III) 3-9 
with error halts, (III) 3-30 
with system restarts, (III) 3-28 


BB_WATCH 
at power-up initialization, (III) 3-4 
requirements, (II) 3-43 
with powerfail interrupts, (IIT) 3-28 
with primary console switching, (III) 3-31 
with primary-eligible (PE) bit, (III) 3-44 


BEFORE, defined for memory access, (I) 5-12 
BEQ instruction, (I) 4-20 
BGE instruction, (I) 4-20 
BGT instruction, (I) 4—20 
BIC instruction, (I) 4-42 


Big-endian addressing, (I) 2-13 


byte operation examples, (I) 4-54 
byte swapping for, A-12 

extract byte with, (1) 4-51 

insert byte with, (I) 4-55 

load F_floating with, (1) 4-91 

load long/quad locked with, (I) 4-9 
load S_floating with, (1) 4-93 
mask byte with, (I) 4-57 

store byte/word with, (I) 4-15 

store F_floating with, (I) 4-95 
store long/quad conditional with, (1) 4-12 
store long/quad with, (I) 4-15 

store S_floating with, (I) 4-97 


Big-endian data types, X_floating, (1) 2-10 

BIS instruction, (1) 442 

BITMAP_CHECKSUM, memory cluster field, (III) 
3-13 

BITMAP_PA, memory cluster field, (III) 3-13 

BITMAP_VA, memory cluster field, (III) 3-13 

BLBC instruction, (1) 4-20 

BLBS instruction, (1) 4-20 

BLE instruction, (I) 4-20 

BLT instruction, (I) 4-20 

BNE instruction, (I) 4-20 


Boolean instructions, (I) 4-41 
logical functions, (I) 4-42 
. Boolean stylized code forms, A-15 


Boot block on disk, (III) 3-37 
Boot environment, restoring, (II-C) 5—23 
Boot sequence, establishing, (II-C) 1-2 
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BOOT_DEV environment variable, (III) 2-27 

with loading system software, (IIT) 3-19 
BOOT_FILE environment variable, (III) 2—27, (IIT) 

3-39 

with loading system software, (III) 3-19 
BOOT_OSFLAGS environment variable, (III) 2-28 

with loading system software, (III) 3-19 
BOOT_RESET environment variable, (III) 2—28 

at system initialization, (III) 3-3 

at warm bootstrap, (III) 3-22 

overriding, (III) 3—27 

with cold bootstrap, (III) 3-9 
BOOTDEF_DEV environment variable, (III) 2—27 

with loading system software, (III) 3-19 
BOOTED_DEYV environment variable 

with loading system software, (III) 3-19 
BOOTED_FILE environment variable, (III) 2—28 

with loading system software, (III) 3-19 
BOOTED_OSFLAGS environment variable, (III) 

2-28 

with loading system software, (IIT) 3-19 

BOOTP-UDP/IP network protocol, (III) 3-42 


Bootstrap address space 
regions, (III) 3-13 
Bootstrap-in-progress (BIP) flag 
at multiprocessor boot, (III) 3-23 
at power-up initialization, (II) 3-4 
at processor initialization, (III) 3-20 
per-CPU state contains, (III) 2-22 
state transitions and, (III) 3-1 
with failed bootstrap, (III) 3-18 
with secondary console, (III) 3-26 
Bootstrapping, (IIT) 3-1 
adding processor while running system, (III) 
3-26 
address space at cold, (IID) 3-13 
boot block in ROM, (III) 3-41 
boot block on disk, (III) 3-37 . 
cold in uniprocessor environment, (III) 3-9 
control to system software, (III) 3-21 
failure of, (III) 3-18 
from disk, (III) 3—37 
from magtape, (III) 3-38 
from MOP-based network, (III) 3-42 
from ROM, (IID) 3-41 
implementation considerations, (III) 3-44 
loading page table space at cold, (III) 3-14 
loading primary image, (III) 3-36 
loading system software, (III) 3-18 
MEMC table at cold boot, (II) 3-12 
multiprocessor, (III) 3-23 
PALcode loading at cold, (III) 3-13 
processor initialization, (III) 3-20 
request from system software, (III) 3-27 
state flags with, (III) 3-18 
system, (III) 3-3 
unconditional, (II) 3—27 
warm, (III) 3-22 
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BPT (PALcode) instruction, (II-A) 2-4 
required recognition of, (I) 6-4 
service routine entry point, (II-A) 6-28 
trap information, (II-A) 6-16 
bpt (PALcode) instruction, (II-B) 2—2, (II-C) 5—46 
required recognition of, (I) 6-4 
BR instruction, (I) 4-21 


Branch instructions, (I) 4-18 


backward conditional, (I) 4-20 
conditional branch, (I) 4—20 
floating-point, summarized, (I) 4-99 
format of, (I) 3-12 

forward conditional, (I) 4—20 
opcodes and format summarized, C-1 
unconditional branch, (1) 4-21 

See also Control instructions 


Branch prediction model, (I) 4-18 
Branch prediction stack,with BSR instruction, (1) 
421 
Breakpoint exceptions, (II-C) 4-8 
initiating, (I-A) 2-4 
Breakpoint trap, initiating, (II-B) 2—2 
BSR instruction, (I) 4—21 
Bugcheck exception, initiating, (II-A) 2—5 
BUGCHK (PALcode) instruction, (I-A) 2-5 
required recognition of, (I) 6-4 
service routine entry point, (II-A) 6-28 
trap information, (II-A) 6-16 
bugchk (PALcode) instruction, (II-B) 2-3 
required recognition of, (I) 6-4 
Byte data type, (I) 2-1 
atomic access of, (I) 5-3 
Byte manipulation, (I) 1-2 
Byte manipulation instructions, (I) 4-47 
Byte swapping, A—12 
Byte_within_page field, (II-A) 3-2, (II-B) 3-2 
BYTE_ZAP(x,y) operator, (I) 3-7 


Cc 


/C opcode qualifier 
IEEE floating-point, (I) 4-67 
VAX floating-point, (I) 4-67 
C opcode qualifier, (I) 4-67 
Cache blocks, virtual 
invalidating all, (II-C) 5-36 
invalidating multiple, (II-C) 5—37 
invalidating single, (II-C) 5-39 
Cache coherency, (II-C) 2-8 


barrier instructions for, (I) 5—25 
defined, (I) 5-2 
HAL interface for, (II-C) 1-3 





Master Index—4 


in multiprocessor environment, (I) 5-6 
Caches 


design considerations, A-—1 
flushing physical page from, (II-A) 2-83, (II-B) 
2-11 


I-stream considerations, A-—5 

MB and IMB instructions with, (I) 5-25 
requirements for, (I) 5-5 

translation buffer conflicts, A-8 

with powerfail/recovery, (I) 5-5 


CALL_PAL (call privileged architecture library) 
instruction, (I) 4-136 
callkd (PALcode) instruction, (II-C) 5-47 
callsys (PALcode) instruction, (II-B) 2-4, (II-C) 
5-48 
entSys with, (II-B) 5-9 
stack frames for, (II-B) 5-3 


CASE operator, (I) 3-8 
Olatanatennhin nnener “TT WA 1 
Valaocuvpilie VLAVL D5 (asre ) tL 
Causal loops, (I) 5-15 
CFLUSH (PALcode) instruction, (II-A) 2-83 
ECB compared with, (I) 4-139 
with powerfail, (II-A) 6—22 
cflush (PALcode) instruction, (II-B) 2-11 
Changed datum, (1) 5-6 
CHAR_SET environment variable, (III) 2-29 


Characters 


getting from console, (III) 2—36 
writing to console terminal, (III) 2-40 


Charged process cycles register, (II-A) 2-91 
in HWPCB, (I-A) 4-2 
in process context, (II-B) 4-1 
PCC register and, (II-A) 4-3 


Checksum, HWRPB field for, (III) 2-10 
at multiprocessor boot, (III) 3-23 

CHME (PALcode) instruction, (II-A) 2-6 
service routine entry point, (II-A) 6—28 
trap initiation, (II-A) 6-17 

CHMK (PALcode) instruction, (I-A) 2~7 
service routine entry point, (II-A) 6—28 
trap initiation, (II-A) 6-17 

CHMS (PALcode) instruction, (I-A) 2-8 
service routine entry point, (II-A) 6-28 
trap initiation, (II-A) 6-17 

CHMU (PALcode) instruction, (II-A) 2-9 


service routine entry point, (IIJ-A) 6—28 
trap initiation, (II-A) 6-17 


Clear a register, A-13 

Clock. See BB_WATCH 

CLOCK_HIGH, IRQL table index name, (II-C) 2-2 
CLOSE device routine, (III) 2—48 

CLRFEN (PALcode) instruction, (II-A) 2—10 
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clrfen (PALcode) instruction, (II-B) 2—5 
Clusters, memory, (III) 3-10 
CMOVEQ instruction, (I) 4-43 
CMOVGE instruction, (I) 4—43 
CMOVGT instruction, (I) 4—43 
CMOVLBC instruction, (I) 4-43 
CMOVLE instruction, (1) 4-43 
CMOVLT instruction, (I) 4-43 
CMOVNE instruction, (I) 4-43 
CMPBGE instruction, (I) 4—49 
CMPEQ instruction, (I) 4-29 
CMPGLE instruction, (I) 4-112 
CMPGLT instruction, (I) 4-112 
CMPLE instruction, (I) 4-29 
CMPLT instruction, (I) 4-29 
CMPTEQ instruction, (I) 4-113 
CMPTLE instruction, (1) 4-113 
CMPTLT instruction, (I) 4-113 
CMPTUN instruction, (I) 4-113 
CMPULE instruction, (I) 4-30 
CMPULT instruction, (I) 4-30 


Code forms, stylized, A-—13 

Boolean, A-15 

load literal, A—-14 

negate, A-15 

NOP, A-13 

NOT, A-15 

register, clear, A—13 

register-to-register move, A—14 
Code scheduling 

IMPLVER instruction with, (I) 4-142 


Code sequences, A—10 
CODEC, (I) 4-154 


Coherency 
cache, (I) 5-2 
memory, (I) 5-1 


Compare instructions 


compare integer signed, (I) 4-29 
compare integer unsigned, (I) 4-30 
See also Floating-point operate 


Conditional move instructions, (1) 4-43 
See also Floating-point operate 
CONFIG block, in HWRPB, (II) 2-11 


CONFIG offset, HWRPB field for, (II) 2-9 
CONFIG. See Configuration data block 
Configuration data block, (III) 2-23 


Console 
at warm bootstrap, (IID 3-22 
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console I/O mode, (III) 3-3 
data structure linkage, (III) 2-69 
data structures loading at cold boot, (II) 3-13 
definition, (II) 1-1 
detached, (III) 1-2 
detached implementations of, (III) 3-46 
embedded, (III) 1-2 
embedded implementation of, (III) 3-44 
error halt and recovery, (III) 3-30 
forcing entry to I/O mode, (III) 3-36 
HWRPB with, (IJ) 2-1 
implementation registry, (III) 1-2 
implementations, (III) 1-2 
inter-console communications buffer, (III) 2-76 
internationalization, (III) 1-4 
interprocessor communications for, (III) 2-75 
ISO Latin-1 support with, (TI) 14 
loading PALcode, (III) 3-13 
loading system software, (III) 3-18 
lock mechanisms, (III) 1-2 
major state transitions, (III) 3-2 

. messages for, (III) 1-3 
miscellaneous routines, (III) 2—63 
multiprocessor boot, (IIT) 3-23 
multiprocessor implementation of, (III) 3-44 
presentation layer, (III) 1-3 
processor state flags, (III) 3-18 
program I/O mode, (III) 3-3 
remapping routines, (III) 2-71 
required environment variables, (III) 2-27 
requirements for, (III) 1-2 
resetting, (IIT) 2-42 
RESTORE_TERM routine, (IJ) 3-34, (II) 

3-35 

SAVE_TERM routine, (III) 3-34, (III) 3-35 
secondary at multiprocessor boot, (III) 3-25 
security for, (III) 14 
sending commands to secondary, (III) 2-77 
sending messages to primary, (III) 2-78 
supported character sets, (II) 2-30 
switching primary processors, (III) 2-64 
with system restarts, (III) 3-27 


Console callback routine block, in HWRPB, (III) 
2-10 
Console callback routines, (III) 2-31 


at cold boot, (III) 3-13 

CTB describes, (III) 2—73 

data structures for, (III) 2-69 

fixing up the virtual address, (III) 2-63 
HWRPB field for, (III) 2-8 
remapping, (III) 2-71 

summary of, (III) 2-32 

system software invoking, (III) 2-32 


Console environment variables 


loading system software, (III) 3-19 
See also Environment variables 


Console firmware, transferring to, (II-C) 5-23 
Console I/O mode, (III) 3-3 

forcing entry to, (IID) 3-36 
Console initialization mode, (IID 3-3 
Console interface, (III) 2-1 
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Console overview, (I) 7-1 


Console routine block (CRB), (III) 2-69 
console callback routines with, (III) 2-69 
initializing, (II) 2-71 
offset, HWRPB field for, (III) 2-8 

structure of, (III) 2-69 

Console terminal block (CTB) 
console callback routines with, (III) 2-69 
described, (III) 2-33, (IID) 2-73 
HWRPB fields for, (III) 2—8 
number, HWRPB field for, (III) 2-8 
offset, HWRPB field for, (III) 2-8 
size, HWRPB field for, (II) 2-8 
structure of, (III) 2-74 , 


Console terminal routines, (III) 2-33 


Context switching 
between address spaces, (II-C) 5-35 
defined, (II-A) 4-1 
hardware, (II-A) 4-2 
initiating, C(T-A\2 91 
PDR register with, (II-C) 3-3 
raising IPL while, (II-A) 4-4 
software, (II-A) 4-2 
thread, (II-C) 5-30 
thread to process, (II-C) 2-10 
thread to thread, (II-C) 2-10 
See also Hardware 

Context valid (CV) flag 
at multiprocessor boot, (III) 3-23 
at processor initialization, (III) 3-20 
per-CPU state contains, (III) 2-22 


Control instructions, (I) 4-18 


Conventions 
code examples, (I) 1-9 
code flows, (II-C) 14 
extents, (I) 1-8 
figures, (I) 1-9 
instruction format, (I) 3-10 
notation, (I) 3-10 
numbering, (I) 1-7 
Corrected error interrupts, logout area for, (II-A) 
6-24 


Count instructions 


Count leading zero, (I) 4-31 
Count population, (I) 4-32 
Count trailing zero, (I) 4-33 


CPU ID, HWRPB field for primary, (III) 2-6 
at multiprocessor boot, (III) 3-23 

CPU slot offset, HWRPB field for, (IID 2-8 

CPYS instruction, (I) 4-105 

CPYSE instruction, (1) 4-i05 

CPYSN instruction, (I) 4-105 

CSERVE (PALcode) instruction, (II-A) 2-84 
required recognition of, (I) 6-4 

cserve (PALcode) instruction, (II-B) 2—12 
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required recognition of, (I) 6—4 

csir (PALcode) instruction, (II-C) 5-4 
clears software interrupts, (II-C) 4-16 

CTB table, in HWRPB, (III) 2-10 

CTB. See Console terminal block 

CTLZ instruction, (I) 4-31 

CTPOP instruction, (I) 4-32 

CTTZ instruction, (I) 4-33 

Current mode field, in PS register, (II-A) 6-6 

Current PALcode, (III) 3-5 

Current PC, (II-A) 6-2 

CVTDG instruction, (I) 4-116 

CVTGD instruction, (I) 4-116 

CVTGF instruction, (I) 4-116 

CVTGQ instruction, (I) 4-114 

CVTLQ instruction, (I) 4-106 

CVTQEF instruction, (1) 4-115 

CVTQG instruction, (I) 4-115 


CVTQL instruction, (I) 4-106 
FP_C quadword with, B-5 
CVTQS instruction, (1) 4-118 
CVTQT instruction, (1) 4-118 
CVTST instruction, (I) 4-120 
CVTTQ instruction, (I) 4-117 
EPC quadword with, B-5 
CVTTS instruction, (1) 4-119 


Cycle counter frequency, HWRPB field for, (III) 
2-7 


D 


/D opcode qualifier 
FPCR (floating-point control register), (I) 4-79 
IEEE floating-point, (I) 4-67 
D_floating data type, (I) 2-5 
alignment of, (1) 2-6 
mapping, (I) 2-6 
restricted, (I) 2-6 
dalnfix (PALcode) instruction, (II-C) 5-5, (II-C) 
5-9 
Data alignment, A-6 


Data alignment trap (DAT) register 
privileged context, (II-A) 2-91 

Data alignment traps, (II-A) 6-15 
fixup (DAT) bit, in HWPCB, (II-A) 4-2 
fixup (DATFX) register, (II-A) 5-9 
registers used, (II-A) 6-15 
service routine entry point, (II-A) 6-28 
system entry for, (II-B) 5—9 
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when concurrent with arithmetic, (I-A) 6-15 


Data caches 


ECB instruction with, (I) 4-137 
WH64 instruction with, (I) 4-147 


Data format, overview, (I) 1-3 


Data sharing (multiprocessor), A-—7 
synchonization requirement, (I) 5-6 
Data stream considerations, A—6 
Data stream translation buffer (DTB), (III) 2-14 
Data structures, shared, (I) 5-6 
Data types 
byte, (I) 2-1 | 
IEEE floating-point, (I) 2-6 
longword, (I) 2-2 
longword integer, (I) 2—11 
quadword, (I) 2-2 
quadword integer, (I) 2-12 
unsupported in hardware, (I) 2-12 
VAX floating-point, (I) 2-3 
word, (I) 2-1 
DATA_BUS_ERROR code, (II-C) 4-18 
Deferred procedure call (DPC) 


SIRR register field for, (II-C) 4-16 
stack for, (II-C) 2-9 


Denormal, (I) 4-64 

Denormal operand exception disable, (I) 4-81 

Denormal operand exception enable (DNOE) 
FP_C quadword bit, B-6 

Denormal operand status (DNOS) 
FP_C quadword bit, B-5 

Denormal operands to zero, (I) 4-81 

Depends order (DP), (I) 5-15 

Detached console, (III) 1-2 

DEVICE ID, CTB field for, (III) 2-75 

DEVICE TYPE, CTB field for, (IID) 2-75 

DEVICE_HIGH_LEVEL, IRQL table index name, 

(II-C) 2-2 
DEVICE_LEVEL, IRQL table index name, (II-C) 
2-2 

Device-specific data (DSD), (II) 2-75 

di (PALcode) instruction, (II-C) 5-6 

DIGITAL UNIX PALcode, instruction summary, 

C-16 

Dirty pages, tracking, (II-C) 3-5 

Dirty zero, (1) 4-64 

Disk bootstrap image, (III) 3-37 

DISPATCH procedure, (III) 2-70 

DISPATCH, CRB fields for, (II) 2-70 

DISPATCH_LEVEL, IRQL table index name, 
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(II-C) 2-2 
DIV operator, (I) 3-8 
DIVF instruction, (1) 4-121 
DIVG instruction, (1) 4-121 
Division 
integer, A-12 
performance impact of, A-—12 


Division by zero bit, exception summary register, 
(II-C) 4-6 
Division by zero enable (DZEE) 
FP_C quadword bit, B-6 
Division by zero status (DZES) 
FP_C quadword bit, B-—5 
Division by zero trap, (II-A) 6—14, (II-B) 5-6, (II-C) 
4-6 
DIVS instruction, (I) 4-122 
DIVT instruction, (1) 4-122 
DMA control, HAL interface for, (II-C) 1-3 
DMK bit, machine check error summary register, 
(II-C) 4-18 
DNOD bit. See Denormal operand exception disable 
DNZ. See Denormal operands to zero 
DP. See Depends order 
DPC bit, machine check error summary register, 
(II-A) 5-14, (II-B) 5-8, (II-C) 4-18 
DRAINA (PALcode) instruction 
required, (I) 6-5 
draina (PALcode) instruction, (II-C) 5-7 


required, (I) 6-5 
with machine checks, (II-C) 4-18 


DSC bit, machine check error summary register, 
(II-A) 5-14, (II-B) 5-8, (II-C) 4-18 

DSD LENGTH, CTB field for, (III) 2-75 

DSD, CTB field for, (II) 2-75 

DSRDB block, in HWRPB, (IID 2-11 

DSRDB offset, HWRPB field for, (IID) 2-10 

DSRDB structure, (III) 2-23 

DTB. See data stream translation buffer 

dtbis (PALcode) instruction, (II-C) 3-5, (II-C) 5-8 

DUMP_DEV environment variable, (III) 2-28 


DYN bit. See Arithmetic traps, dynamic rounding 
mode 


Dynamic system recognition data block. See DSRDB 


DZE bit 


exception summary parameter, (II-A) 6-13 
exception summary register, (II-B) 5-6, (II-C) 


See also Arithmetic traps, division by zero 
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DZED bit. See Trap disable bits, division by zero 


E 


ealnfix (PALcode) instruction, (II-C) 5—9 
ECB (Evict data cache block) instruction, (I) 4-137 
CFLUSH (PALcode) instruction with, (1 4-139 
ei (PALcode) instruction, (II-C) 5—10 
as synchronization function, (II-C) 4-15 
Embedded console, (III) 1-2 
ENABLE_AUDIT environment variable, (III) 2-29, 
(III) 3-36 
entArith. See Arithmetic trap entry. 
entIF. See Instruction fault entry 
entInt. See Interrupt entry 
entMM. See Memory management fault entry 
ENTRY, CRB field for, (ULL) 2-7/1 
entSys. See System call entry 
entUna. See Unaligned access fault 


Environment variables, (III) 2-25 
at power-up initialization, (III) 3-4 
at processor initialization, (III) 3-20 
getting, (III) 2-58 
resetting, (III) 2-59 
routines described, (III) 2-57 
saving, (III) 2-60 
setting, (III) 2-62 


EQV instruction, (I) 4-42 

Error halt and recovery, (III) 3-30 

Error messages 
console, (III) 1-3 

Errors, correctable, (II-C) 4-16 

Errors, correctable processor, (II-B) 5-8 

Errors, correctable system, (II-B) 5-8 

Errors, uncorrectable, (II-C) 4—17 

EXCB (exception barrier) instruction, (I) 4-139 
with FPCR, (1) 4-84 


Exception classes, (II-C) 4-1 


registry of handling routines for, (II-C) 5—41 
values for, (II-C) 5-42 


Exception dispatch, (II-C) 4-1 
Exception handlers, B-3 
TRAPB instruction with, (1) 4-146 
Exception handling routines, registery for, (II-C) 
5-41 
Exception register write mask, (II-B) 5-6 


Exception service routines 
entry point, (II-A) 6-26 
introduced, (II-A) 6-8 
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Exception summary parameter, (II-A) 6-12 
Exception summary register, (II-B) 5-2, (II-B) 5-5, 
(II-C) 4-6 
format of, (II-B) 5—5 
EXCEPTION_SUMMARY, (II-C) 4-6 


Exceptional events 


actions, summarized, (II-A) 6—2 
defined, (I-A) 6-1 


ExceptionPC address, (II-C) 4-5 


Exceptions 
actions, summarized, (II-A) 6-2 
arithmetic, (II-C) 4-5 
breakpoint, (II-C) 4-8 
defined, (II-B) 5—1 
F31 with, (1) 3-2 
general class common dispatch, (II-C) 4-10 
general class of, (II-C)44 | 
illegal instruction, (II-C) 4-7 
inibalsineentrunoints.. Gl Oye 2 
initiated before interrupts, (II-A) 6-18 
initiated by PALcode, (II-A) 6-31 
introduced, (IJ-A) 6-8 
invalid address, (II-C) 4-7 
memory management class, (II-C) 4-3 
processor state transitions, (II-A) 6-37 
R31 with, () 3-1 
returning from, (IJ-C) 4-2, (IJ-C) 5-27 
software, (II-C) 4-8 
stack frames for, (II-A) 6-7, (II-B) 5-3 
subsetted IEEE, (II-C) 4-9 
system service calls, (II-C) 4-4 
trap frames with, (II-C) 4-3 
unaligned access, (II-C) 4-7 
See also Arithmetic traps 


Executive read enable (ERE), bit in PTE, (II-A) 3-5 


Executive stack pointer (ESP) register, (II-A) 5-10 


as internal processor register, (II-A) 5-1 

in HWPCB, (I-A) 4-2 

in initial HWPCB, (III) 3-21 
Executive write enable (EWE), bit in PTE, 


3-4 
EXTBL instruction, (I) 4-51 
Extended VA size, HWRPB field for, (III) 2-7 
EXTLH instruction, (1) 4-51 
EXTLL instruction, (I) 4-51 
EXTQH instruction, (I) 4-51 
EXTQL instruction, (I) 4-51 
Extract byte instructions, (I) 4-51 
EXTWH instruction, (I) 4-51 
EXTWL instruction, (I) 4-51 


F 


F_floating data type, (I) 2-3 
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ia 


TT-A) 
aa} 


alignment of, (I) 2-4 
compared to IEEE S_floating, (I) 2-8 
MAX/MIN, (1) 4-65 
when data is unaligned, (II-A) 6-28 

Fault on execute (FOE), (II-A) 6—12, (II-B) 3-12 
bit in PTE, (I-A) 3-6, (II-B) 3-5 
service routine entry point, (II-A) 6-27 
software usage of, (IJ-A) 6-12 

Fault on read (FOR), (I-A) 6-11, (II-B) 3-12 
bitin PTE, (II-A) 3-6, (II-B) 3-5 
service routine entry point, (II-A) 6-27 
software usage of, (II-A) 6-11 

Fault on write (FOW), (I-A) 6-11, (II-B) 3-12 
bit in PTE, (II-A) 3-6, (II-B) 3-5, (II-C) 3-5 
service routine entry point, (II-A) 6—27 
software usage of, (II-A) 6-11 

Faults, (II-A) 6-9 
access control violation, (IJ-A) 6-10 
defined, (II-A) 6-9, (II-B) 5-1 
fault on execute, (II-A) 6-12, (II-B) 3-12 
fault on read, (IJ-A) 6-11, (II-B) 3-12 
fault on write, (II-A) 6—11, (II-B) 3-12 
floating-point disabled, (II-A) 6-10 
memory management, (II-B) 3-12 
MM flag, (I-A) 6-10 
program counter (PC) value, (II-A) 6—9 
REI instruction with, (II-A) 6-9 
translation not valid, (IJ-A) 6-10 


FBEQ instruction, (I) 4-100 

FBGE instruction, (I) 4-100 

FBGT instruction, (I) 4-100 

FBLE instruction, (I) 4-100 

FBLT instruction, (I) 4-100 

FBNE instruction, (J) 4-100 

FCMOVEQ instruction, (I) 4-107 

FCMOVGE instruction, (I) 4-107 

FCMOVGT instruction, (I) 4-107 

FCMOVLE instruction, (1) 4-107 

FCMOVLT instruction, (1) 4-107 

FCMOVNE instruction, (I) 4-107 

FEN. See Floating-point enable 

FETCH (prefetch data) instruction, (I) 4-140 . 

FETCH_M (prefetch data, modify intent) instruction, 
(1) 4-140 

Field replaceable unit (FRU) 


offset, HWRPB field for, (II) 2-9 
table description, (III) 2—23 
table, in HWRPB, (III) 2-11 


Finite number, Alpha, contrasted with VAX, (JD) 
4-63 

Firmware components, (II-C) 1—2 

Firmware restart, (II-C) 2—7 
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Firmware restart address, (JI-C) 6-4 


FIXUP console routine, (III) 2-63 


procedure descriptor for, (III) 2—70 

using, (III) 2-72 

with PALcode switching, (III) 3-7 
FLOAT_REGISTER_MASK, (II-C) 4-5 


Floating-point branch instructions, (I) 4-99 


Floating-point control register (FPCR) 
accessing, (I) 4-82 
at processor initialization, (I) 4-83 
bit descriptions, (I) 4-80 
instructions to read/write, (I) 4-109 
operate instructions that use, (I) 4-102 
saving and restoring, (I) 4-83 
trap disable bits in, (1) 4-78 
Floating-point convert instructions, (I) 3-14 
Fa field requirements, (I) 3-14 
Floating-point disabled fault, (I-A) 6—10 
service routine entry point, (II-A) 6-27 
Floating-point division, performance impact of, 
A-12 
Floating-point enable (FEN) register 
at processor initialization, (II) 3—20 
clearing, (II-A) 2-10 
defined, (II-B) 1-3 
described, (II-A) 5-11 
in HWPCB, (I-A) 4-2 
in initial HWPCB, (III) 3-21 
in process context, (II-B) 4-1 
privileged context, (II-A) 2-91 
with PALcode switching, (III) 3-8 
Floating-point format, number representation 
(encodings), (1) 4-65 


Floating-point instructions 
branch, (1) 4—99 
faults, (I) 4-62 
function field format, (1) 4-84 
introduced, (I) 4-62 
memory format, (I) 4-90 
opcodes and format summarized, C-—1 
operate, (I) 4-102 
rounding modes, (I) 4-66 
terminology, (1) 4-63 
trapping modes, (I) 4-69 
traps, (I) 4-62 
Floating-point load instructions, (I) 4-90 
load F_floating, (I) 4-91 
load G_floating, (1) 4-92 
load S_floating, (1) 4-93 
load T_floating, (1) 4-94 
with non-finite values, (1) 4—90 
Floating-point operate instructions, (I) 4-102 
add (IEEE), (1) 4-111 
add (VAX), (I) 4-110 
compare (IEEE), (I) 4-113 
compare (VAX), (I) 4-112 
conditional move, (I) 4-107 . 
convert IEEE floating to integer, (I) 4-117 


Master Index—9 


convert integer to IEEE floating, (1) 4-118 

convert integer to integer, (I) 4-106 

convert integer to VAX floating, (I) 4-115 

convert S_floating to T_floating, (I) 4-119 

convert T_floating to S_floating, (I) 4-120 

convert VAX floating to integer, (I) 4-114 

convert VAX floating to VAX floating, (1) 
4-116 

copy sign, (I) 4-105 

divide (IEEE), (I) 4-122 

divide (VAX), (1D 4-121 

format of, (I) 3-13 

from integer moves, (I) 4-125 

move from/to FPCR, (I) 4—109 

multiply (IEEE), (I) 4-128 

multiply (VAX), (I) 4-127 

subtract IEEE), (I) 4-132 

subtract (VAX), (I) 4-131 

to integer moves, (1) 4-123 

unused function codes with, (I) 3-14 


Floating-point registers, (I) 3-2 
with PALcode switching, (III) 3-8 
See also Registers 
Floating-point single-precision operations, (I) 4-62 
Floating-point store instructions, (I) 4-90 
store F_floating, (I) 4-95 
store G_floating, (I) 4-96 
store S_floating, (I) 4-97 
store T_floating, (I) 4-98 
with non-finite values, (1) 4-90 
Floating-point support 


floating-point control (FP_C) quadword, B-4 
IEEE, (1) 2-6 

TEEE standard 754-1985, (1) 4-88 
instruction overview, (I) 4-62 
longword integer, (I) 2-11 

operate instructions, (I) 4-102 
optional, (I) 4-2 

quadword integer, (I) 2-12 
rounding modes, (I) 4-66 
single-precision operations, (I) 4-62 
trap modes, (I) 4—69 

VAX, (I) 2-3 


Floating-point to integer move, (I) 4-123 
Floating-point to integer move instructions, (I) 3-14 


Floating-point trapping modes, (I) 4-69 
See also Arithmetic traps 
FNOP code form, A-13 


FOE. See Fault on execute 

FOR. See Fault on read 

FOW. See Fault on write 

FP. See Frame pointer 

FP_C quadword, B—-4 

FPCR. See Floating-point control register 

Frame pointer (FP) register, linkage for, (II-B) 1-1 
FRU. See Field replaceable unit 
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FTOIS instruction, (1) 4-123 
FTOIT instruction, (1) 4-123 


Function codes 
IEEE floating-point, C—6 
in numerical order, C—10 
independent floating-point, C-—8 
VAX floating-point, C-—7 
See also Opcodes 


G 


G_floating data type, (I) 2-4 
alignment of, (I) 2-5 
mapping, (I) 2-5 
MAX/MIN, (I) 4-65 
when data is unaligned, (II-A) 6-28 
General class exceptions, (II-C) 4—4 
common dispatch of, (II-C) 4-10 
General excention address (GENERAL ENTRY) 
register, (II-C) 2-4 
GENTRAP (PALcode) instruction, (II-A) 2-11 
required recognition of, (I) 64 
trap information, (II-A) 6—17 
gentrap (PALcode) instruction, (II-B) 2-6, (II-C) 
5-50 
raises software exceptions, (II-C) 4-8 
required recognition of, (I) 6-4 
GET_ENV variable routine, (III) 2-58 


GETC terminal routine, (III) 2-36 
ISO Latin-1 support and, (III) 1-4 
GH. See Granularity hint 


Global pointer (GP) register, linkage for, (II-B) 1-1 
Global translation hint, (II-C) 3-5 


Granularity hint (GH) 
bits in PTE, (II-A) 3—5, (II-B) 3—5, (II-C) 3-5 
block in HWRPB, (III) 2—14 
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H 


HAL (Hardware abstraction layer), (II-C) 1-3 


HALT (PALcode) instruction 
required, (I) 6-7 
state transitions and, (III) 3-1 
halt (PALcode) instruction, (II-C) 5-11 
required, (I) 6-7 
writes PAL_BASE register, (II-C) 2-5 
See also reboot (PALcode) instruction 
Hait PCBB register, per-CPU siot field for, (iiD 
2-19 
Halt processor, per-CPU slot fields for, (III) 2-19 


Halt requested, per-CPU state flag, (III) 2-21 
at multiprocessor boot, (III) 3-23 
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Hardware abstraction layer 
interfaces for, (II-C) 1-3 
Hardware context, (II-B) 4-1 
Hardware errors, when unrecoverable, (II-C) 4-10 


Hardware interrupts, (II-C) 4-13 
interprocessor, (II-A) 6-21 
interval clock, (II-A) 6—20 
powerfail, (II-A) 6-21 
servicing, (II-B) 5-7 
Hardware nonprivileged context, (II-A) 4-3 


Hardware privileged context, (II-A) 4-2 
switching, (II-A) 4—2 
Hardware privileged context block (HPCB) 
process unique value in, (II-A) 2-79 
swapping ownership of, (I-A) 2-91 
Hardware privileged context block (HWPCB) 
at cold boot, (IIT) 3-21 
at warm boot, (III) 3-22 
format, (II-A) 4-2 
original built by HWRPB, (II-A) 4-5 
PCBB register, (II-A) 5-16 
specified by PCBB, (II-A) 4-2 
writing to, (II-A) 4-3 
Hardware restart parameter block (HWRPB), (IID) 
2-1 
fields for, (IIT) 2-6 
interval clock interrupt, (II-A) 6-20 
loading at cold boot, (III) 3-13 
logout area, (II-A) 6-24 
overview of, (III) 2-2 
size field in, (III) 2-6 
structure of, (III) 24 
with cold boot, (III) 3-9 


HIGH_LEVEL, IRQL table index name, (II-C) 2-2 
HWPCEB. See Hardware privileged context block 
HWRPB. See Hardware restart parameter block 


I/O access, nonmapped, (II-C) 3-1 
I/O device interrupts, (II-A) 6-20 


I/O device registers, at power-up initialization, (III) 
3-4 


I/O devices 
closing generic for access, (III) 2-48 
device-specific operations for, (III) 2-49 
generic routines for, (III) 2-46 
opening generic for access, (III) 2-51 
reading from generic, (III) 2-53 
required implementation support for, (IID) 2-51 
service routine entry points, (II-A) 6-30 
writing to generic, (III) 2-55 

I/O devices, DMA 
MB and WMB with, (1) 5-22 
reliably communicating with processor, (I) 
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shared memory locations with, (I) 5-11 
I/O interface overview, (I) 8-1 


I/O support, HAL interface for, (II-C) 1-3 


IEEE floating-point 
exception handlers, B-3 ; 
floating-point control (FP_C) quadword, B-4 
format, (I) 2-6 
FPCR (floating-point control register), (I) 4-79 
function field format, (I) 4-85 
hardware support, B-2 
NaN, (1) 2-6 
options, B-1 
S_floating, (1) 2-7 
standard charts, B—-12 
standard, mapping to, B-6 
T_floating, (I) 2-8 
trap handling, B-6 
X_floating, (I) 2-9 
See also Floating-point instructions 


IEEE floating-point control word, B—-4 


IEEE floating-point instructions 
add instructions, (1) 4-111 
compare instructions, (I) 4-113 
convert from integer instructions, (I) 4-118 
convert S_floating to T_floating, (1) 4-119 
convert T_floating to S_floating, (I) 4-120 
convert to integer instructions, (I) 4-117 
divide instructions, (1) 4-122 
from integer moves, (I) 4-125 
function codes for, C-6 
multiply instructions, (1) 4-128 
operate instructions, (I) 4-102 
square root instructions, (I) 4-130 
subtract instructions, (1) 4-132 
to register moves, (I) 4-123 
IEEE standard, (1) 4-88 
conformance to, B-1 
mapping to, B-6 
IEEE, subsetted instruction exception, (II-C) 4-9 
IGN (ignore), (I) 1-9 
IKSP register. See Kernel stack pointer, initial 
Illegal instruction exceptions, (II-C) 4-7 


IHegal instruction trap, (II-A) 6-16 
service routine entry point, (II-A) 6—28 
Illegal operand trap, service routine entry point, 
(II-A) 6-28 
Illegal PALcode operand trap, (II-A) 6-16 
IMB (PALcode) instruction, (I) 5—23 


required, (I) 6-8 
virtual I-cache coherency, (I) 5—5 


imb (PALcode) instruction, (II-C) 5-51 
required, (I) 6-8 
IMP (implementation dependent), (1) 1-9 
IMPLVER (Implementation version) instruction, (I) 
4-142 
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IMPLVER value assignments, D-3 
Independent floating-point function codes, C-8 


INE bit 


exception summary parameter, (II-A) 6-13 
exception summary register, (II-B) 5—5, (II-C) 
4-6 


See also Arithmetic traps, inexact result 
INED bit. See Trap disable bits, inexact result trap 


Inexact result bit, exception summary register, (II-C) 
4-6 
Inexact result enable (INEE) 
FP_C quadword bit, B-6 
Inexact result status (INES) 
FP_C quadword bit, B-5 
Inexact result trap, (II-A) 6-14, (II-B) 5-5, (II-C) 
4-6 
Infinity, (I) 4-64 
conversion to integer, (1) 4-88 
Initialization, PALcode environment, (II-C) 6-1 
initpal (PALcode) instruction, (II-C) 5-12, (II-C) 
5-14 
at initialization, (II-C) 6-2 
reads PAL_BASE register, (II-C) 2-5 
writes KGP register, (II-C) 2-4 
writes PCR register, (II-C) 2-5 
writes PDR register, (II-C) 2-6 
initpcr (PALcode) instruction, (II-C) 5-14 
INSBL instruction, (1) 4-55 
Insert byte instructions, (1) 4-55 


Insert into queue PALcode instructions 


longword, (II-A) 2-46 
longword at head interlocked, (II-A) 2—30 
longword at head interlocked resident, (II-A) 


longword at tail interlocked, (II-A) 2-38 
longword at tail interlocked resident, (IJ-A) 


na AM 


£—++U 
quadword, (II-A) 2-48 
quadword at head interlocked, (II-A) 2-34 
quadword at head interlocked resident, (II-A) 


_ quadword at tail interlocked, (II-A) 2-42 
quadword at tail interlocked resident, (II-A) 


INSLH instruction, (I) 4-55 

INSLL instruction, (I) 4-55 

INSQH instruction, (1) 4-55 

INSQHIL (PALcode) instruction, (II-A) 2-30 
INSQHILR (PALcode) instruction, (I-A) 2~32 
INSQHIQ (PALcode) instruction, (II-A) 2-34 
INSQHIQR (PALcode) instruction, (II-A) 2-36 
INSQL instruction, (I) 4-55 
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INSQTIL (PALcode) instruction, (I-A) 2-38 
INSQTILR (PALcode) instruction, (II-A) 2-40 
INSQTIQ (PALcode) instruction, (II-A) 2—42 
INSQTIQR (PALcode) instruction, (I-A) 2-44 
INSQUEL (PALcode) instruction, (IJ-A) 246 
INSQUEL/D (PALcode) instruction, (II-A) 2-46 
INSQUEQ (PALcode) instruction, (II-A) 2-48 
INSQUEQ/D (PALcode) instruction, (II-A) 2-48 


Instruction encodings 
common architecture, C—1 
numerical order, C—10 
opcodes and format summarized, C-—1 


Instruction fault entry (entIF) register, (II-B) 1-2, 
(II-B) 5-4, (II-B) 5-7 

Instruction fault, system entry for, (II-B) 5-4 

Instruction fetches (memory), (I) 5-11 


Instruction formats 
branch, (I) 3-12 
conventions, (I) 3-10 
floating-point convert, (I) 3-14 
floating-point operate, (I) 3-13 
floating-point to integer move, (I) 3-14 
illegal trap, (I-A) 6-16 
memory, (I) 3-11 
memory jump, (1) 3-12 — 
operand values, (I) 3-10 
operators, (I) 3-6 
overview, (I) 1-4 
PALcode, (I) 3-14 
registers, (I) 3-1 

Instruction set 
access type field, (I) 3-5 
Boolean, (I) 4-41 
branch, (I) 4-18 
byte manipulate, (I) 4-47 
conditional move (integer), (I) 4-43 
data type field, (I) 3-6 
floating-point subsetting, (1) 4-2 
integer arithmetic, (I) 4-24 
introduced, (I) 1-6 
jump, (I) 4-18 
load memory integer, (I) 44 
miscellaneous, (I) 4-133 
multimedia, (I) 4-154 
name field, (I) 3-5 
opcode qualifiers, (I) 4-3 
operand notation, (I) 3-5 
overview, (I) 4-1 
shift, arithmetic, (1) 4-46 
software emulation rules, (1) 4-3 
store memory integer, (I) 4—4 
VAX compatibility, (1) 4-152 
See also Floating-point instructions 


Instruction stream translation buffer (ITB), (IID) 
2-14 


Instruction stream. See I-stream 
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Instructions, overview, (I) 1-4 

INSWH instruction, (I) 4-55 

INSWL instruction, (I) 4-55 

Integer division, A—12 

Integer overflow bit, exception summary register, 

(II-C) 4-6 

Integer overflow trap, (II-A) 6-15, (II-B) 5-5, (II-C) 

4-6 


Integer registers 
defined, (I) 3-1 
R31 restrictions, (I) 3-1 
with PALcode switching, (IID) 3-8 
See also Registers 


INTEGER_REGISTER_MASK, (II-C) 46 


Internal processor registers (IPR) 


address space number, (II-A) 5-4, CII-C) 2-4 

AST enable, (II-A) 5-5 

AST summary, (II-A) 5—7 

CALL_PAL MFPR with, (II-A) 5-1 

CALL_PAL MTPR with, (III-A) 5-1 

data alignment trap fixup, (II-A) 5-9 

defined, (I-A) 1-1 

executive stack pointer, (II-A) 5—10 

floating-point enable, (II-A) 5-11 

general exception address, (II-C) 2-4 

interprocessor interrupt request, (II-A) 5-12 

interrupt exception address, (II-C) 2-4 

interrupt priority level, (II-A) 5-13 

kernel global pointer, (II-C) 2-4 

kernel mode with, (I-A) 5—1 

kernel stack pointer (IKSP), initial, (II-C) 2-4 

machine check error summary, (II-A) 5-14, 
(II-C) 2-5 

memory management exception, (II-C) 2-5 

MFPR instruction with, (IJ-A) 2-86 

MTPR instruction with, (I-A) 2-87 

page directory base, (II-C) 2-6 

page table base, (II-A) 5-18 

PALcode image base address, (II-C) 2—5 

panic exception, (IJ-C) 2—5 

performance monitoring, (II-A) 5-15 

privileged context block base, (II-A) 5-16 

process control region base, (II-C) 2-5 

processor base, (II-A) 5-17 

processor status, (II-C) 2-6 

restart execution address, (II-C) 2-6 

returning state of, (II-C) 5-21 

software interrupt request, (II-A) 5-20, (II-C) 
2-6 


software interrupt summary, (II-A) 5-21 
summarized, (II-A) 5—2, (II-C). 2-2 
supervisor stack pointer, (II-A) 5-22 
system control block base, (I-A) 5-19 
system service exception address, (II-C) 2-6 
thread environment block base, (II-C) 2-6 
thread unique value, (II-C) 2-7 
translation buffer check, (II-A) 5—23 
translation buffer invalidate all, (II-A) 5—24 
translation buffer invalidate all process, (II-A) 
5-25 
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translation buffer invalidate single, (II-A) 5-26 
user stack pointer, (II-A) 5-27 
virtual page base, (II-A) 5-28 
Who-Am-I, (I-A) 5-29 
Interprocessor console communications, (III) 2-75 


Interprocessor interrupt, (II-A) 6-21 
generating, (II-B) 2-28 
protocol for, (II-A) 6-21 
service routine entry point, (II-A) 6-29 
Interprocessor interrupt request (IPIR) register 
described, (II-A) 5-12 
protocol for, (II-A) 6-21 
Interrupt acknowledge, (II-C) 4-15 


Interrupt dispatch 

example, (II-C) 4—13 

table IDT), (II-C) 4-13 
Interrupt enable mask, (II-C) 4-12 


Interrupt entry (entInt) register, (II-B) 1-2, (II-B) 
5-4, (II-B) 5-7 
Interrupt exception address (INTERRUPT_ENTRY) 
register, (II-C) 2-4 
Interrupt handling 
HAL interface for, (II-C) 1-3 
Interrupt level table (LT), (I-C) 4-12 
index values/names for, (II-C) 2—2 
Interrupt mask table (IMT), (I-C) 4-12 


Interrupt pending (IP) field, in PS register, (II-A) 
6—6 
Interrupt priority level, (II-A) 6-7 
Interrupt priority level (IPL) 
at processor initialization, (III) 3-20 
events associated with, (II-A) 6-18 
field in PS register, (II-A) 6-6 
hardware levels, (II-A) 6—7 
kernel mode software with, (II-A) 6-18 
operation of, (II-A) 6-17 
PS with, (II-B) 5-2 
recording pending software (SISR register), 
(I-A) 5-21 
requesting software (SIRR register), (I-A) 
service routine entry points, (II-A) 6—29 
software interrupts, (II-A) 6—19 
software levels, (II-A) 6-7 
with PALcode switching, (III) 3-8 
See also Interrupt priority level (IPL) register 
Interrupt priority level (IPL) register 
described, (II-A) 5-13 
interrupt arbitration, (II-A) 6—35 
See also Interrupt priority level (IPL) 
Interrupt request levels (IRQL) 
ILT table for, (I-C) 4-12 
in PSR, (II-C) 2-1 
PSR and di instruction, (II-C) 5-6 
swapping, (II-C) 5-32 
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Interrupt service routines 
entry point, (IJ-A) 6-26 
in each process, (II-A) 6-18 
introduced, (II-A) 6-17 


Interrupt tables IDT, ILT, IMT), (II-C) 2-7 
Interrupt tables, at initialization, (II-C) 6-3 
Interrupt trap frame, building, (II-C) 4-14 
Interrupt vectors,mask table for, (II-C) 4-12 


Interrupts, (II-C) 4-12 
actions, summarize, (II-A) 6—2 
disabling, (II-C) 5—6 
enabling, (II-C) 5-10 
hardware arbitration, (II-A) 6-34 
I/O device, (II-A) 6-20 
initiated by PALcode, (II-A) 6-31 
initiation, (II-A) 6-18 
instruction completion, (II-A) 6-17 
interprocessor, (II-A) 6-21 
introduced, (I-A) 6-17 
PALcode arbitration, (II-A) 6-34 
passive release, (II-A) 6—20 
powerfail, (II-A) 6-21 
processor state transitions, (II-A) 6-37 
processor status register and, (II-C) 2-1 
program counter value, (II-A) 6—2 
returning from, (II-C) 5-27 
software, (II-A) 6-19 
software requests for, (II-C) 4-15 
sources for, (II-B) 5-2 
stack frames for, (I-A) 6-7, (II-B) 5-3 
system entry for, (II-B) 5-4 
Interval clock interrupt, (I-A) 6—20 
HWRPPB field for, (III) 2-7 
service routine entry point, (II-A) 6-29 
intr_flag register, (II-B) 1-3 
cleared by retsys, (II-C) 5-26 
cleared by rfe, (II-C) 5—28 
INV bit 
. exception summary parameter, (II-A) 6-13 
exception summary register, (II-B) 5—6, (IT-C) 
4-6 
See also Arithmetic traps, invalid operation 


Invalid address exceptions, (II-C) 4-7 
Invalid operation bit, exception summary register, 
(II-C) 4-6 
Invalid operation enable (INVE) 
FP_C quadword bit, B-6 
Invalid operation status INVS) 
FP_C quadword bit, B-—5 
Invalid operation trap, (II-A) 6—14, (II-B) 5-6, 
(II-C) 4—6 
INVD bit. See Trap disable bits, invalid operation 
IOCTL console device routine, (III) 2-49 


IOV bit 
exception summary parameter, (II-A) 6-13 
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exception summary register, (II-B) 5—5, (II-C) 
See also ere traps, integer overflow 
IPI_LEVEL, IRQL table index name, (II-C) 2-2 
IPL. See Interrupt priority level 
IPR. See Internal processor registers (IPR) 
IPR_KSP (internal processor register kernel stack 
pointer), (II-A) 5-1 
IRQL 


See Interrupt request levels 
See also rdirql] and swpirq] 


ISO Latin-1 support, (III) 1-4 
PROCESS_KEYCODE and, (III) 2—38 
I-stream 


coherency of, (I) 6-8 
design considerations, A-2 
modifying physical, (I) 5-5 
modifying virtual, (I) 5—5 
PALcode with, (I) 6-2 
with caches, (I) 5-5 


ITB. See Instruction stream translation buffer 
ITOFF instruction, (1) 4-125 
ITOFS instruction, (I) 4-125 
ITOFT instruction, (1) 4-125 


J 


JMP instruction, (1) 4-22 
JSR instruction, (I) 4-22 
JSR_COROUTINE instruction, (I) 4-22 
Jump instructions, (I) 4-18, (1) 4-22 
branch prediction logic, (I) 4-22 
coroutine linkage, (I) 4-23 
return from subroutine, (I) 4-22 


unconditional long jump, (I) 4-23 
See also Control instructions 


K 








' kbpt (PALcode) instruction, (II-C) 5-52 


Kernel global pointer (KGP) register, (II-B) 1-3, 
(II-C) 24 
at initialization, (II-C) 6—2 
initializing, (II-C) 5-12 
Kernel read enable (KRE) 
bitin PTE, (IJ-A) 3-5, (II-B) 3-4 
with access control violation (ACV) fault, 
(II-A) 3-13 
Kernel stack, (II-C) 2-9 
under/overflow detection, (II-C) 5-54 
Kernel stack pointer (IKSP), initial, (I-C) 2-4 
initializing, (II-C) 5-12 
returning contents of, (II-C) 5-17 
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swapping to current, (II-C) 5-33 
with context switch, (II-C) 2-10, (II-C) 5-31 
with trap frames, (II-C) 4-3 

Kernel stack pointer (KSP) register 


at processor initialization, (III) 3-20 
defined, (II-B) 1-3 

in HWPCB, (II-A) 4-2 

in initial HWPCB, (III) 3-21 

in process context, (II-B) 4-1 

with PALcode switching, (III) 3-8 


Kernel stack, PALcode access to, (II-A) 6-30 
Kernel stack, when corrupted, (II-C) 4-10 
Kernel write enable (KWE) 

bit in PTE, (II-A) 3-4, (II-B) 3-4 
KERNEL_BREAKPOINT breakpoint type, (II-C) 

4-9 

Keycode, translating, (III) 2-38 
KGP. See Kernel global pointer 


Kseg 


format of, (II-B) 3-2 

mapping of, (II-B) 3-1 

physical space with, (II-B) 3-3 
KSP. See Kernel stack pointer 


L 


LANGUAGE environment variable, (II) 2-29 
Languages, supported by console, (III) 2-30 
LDA instruction, (I) 4-5 
LDAH instruction, (1) 4-5 
LDBU instruction, (I) 4-6 
LDF instruction, (I) 4-91 

when data is unaligned, (II-A) 6-28 
LDG instruction, (I) 4-92 

when data is unaligned, (II-A) 6-28 
LDL instruction, (I) 4-6 

when data is unaligned, (II-A) 6—28 


LDL_L instruction, (1) 4-9 


restrictions, (I) 4-10 
with processor lock register/flag, (1) a 10 
with STx_C instruction, (I) 4-9 


LDQ instruction, (I) 4-6 
when data is unaligned, (II-A) 6-28 
LDQ _L instruction, (I) 4-9 


restrictions, (I) 4-10 

when data is unaligned, (II-A) 6—28 

with processor lock register/flag, (I) 4-10 
with STx_C instruction, (1) 4-10 


LDQ_U instruction, (I) 4-8 
LDQP (PALcode) instruction, (II-A) 2-85 


LDS instruction, (1) 4-93 __ 
when data is unaligned, (II-A) 6—28 
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with FPCR, (I) 4-84 
LDT instruction, (1) 4—94 

when data is unaligned, (II-A) 6-28 
LDWU instruction, (I) 4-6 


LEFT_SHIFT(x,y) operator, (I) 3-8 

lg operator, (I) 3-8 

LICENSE environment variable, (IIT) 2—29 
Literals, operand notation, (I) 3-5 

Litmus tests, shared data veracity, (I) 5-17 


Load instructions 


emulation of, (I) 4—3 

FETCH instruction, (I) 4—140 

Load address, (I) 4-5 

Load address high, (1) 4-5 

load byte, (I) 46 

load longword, (I) 4-6 

load quadword, (I) 4-6 

load quadword locked, (I) 4-10 

load sign-extended longword locked, (I) 4-9 
load unaligned quadword, (I) 4-8 

load word, (I) 4-6 

multiprocessor environment, (I) 5-6 
serialization, (I) 4-143 

when data is unaligned, (II-A) 6-28 
See also Floating-point load instructions 


Load literal, A-14 

Load memory integer instructions, (I) 4-4 
LOAD_LOCKED operator, (1) 3-8 
Load-locked, defined, (I) 5-16 

Location, (I) 5-11 

Location access constraints, (I) 5-14 


Lock flag, per-processor 

defined, (I) 3-2 

when cleared, (I) 4-10 

with load locked instructions, (1) 4-10 
Lock registers, per-processor 

defined, (I) 3-2 

with load locked instructions, (1) 4-10 
Lock variables, with WMB instruction, (I) 4-150 
lock_flag register, (II-B) 1-3 

cleared by retsys, (II-C) 5-26 

cleared by rfe, (II-C) 5-28 
Logical instructions. See Boolean instructions 


Logout area, (II-A) 6-24 
length, per-CPU slot field for, (III) 2-18 
physical address, per-CPU slot field for, (IID) 
2-18 
Longword data type, (I) 2-2 


alignment of, (1) 2-12 
atomic access of, (I) 5-2 


LSB (least significant bit), defined for floating-point, 
(1) 4-64 
LURT table, (III) 2-24 
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/M opcode qualifier, IEEE floating-point, (I) 4-67 
Machine check error handling, (II-C) 4-17 


Machine check error summary (MCES) register, 
(I-C) 2—5 
at processor initialization, (III) 3-20 
defined, (II-B) 1-3 
described, (II-A) 5-14 
format of, (II-C) 4-17 
reading, (II-B) 2-13 
returning contents of, (II-C) 5-18 
structure of, (II-B) 5-7 
using, (IJ-A) 6-24 | 
with PALcode switching, (III) 3-8 
writing, (II-B) 2-30 
writing values to, (II-C) 5-43 
Machine checks, (II-A) 6-22 . 
actions, summarized, (II-A) 6—2 
caiasirophic conditions wiih, (i-C) 4-19 
classes of, (II-C) 4-16 
disabling during debug, (II-C) 4-18 
initiated by PALcode, (II-A) 6-31 
interrupt entry for, (II-B) 5-7 
logout area, (II-A) 6-24 
masking, (II-A) 6—23 
no disabling of, (II-A) 6—23 
one per error, (II-A) 6-24 
processor correctable, (II-A) 6—23 
program counter (PC) value, (II-A) 6—23 
REI instruction with, (IJ-A) 6-23 
retry flag, (II-A) 6-24 
service routine entry points, (II-A) 6-29 
sources for, (II-C) 4-16 
stack frames for, (II-A) 6-7 
system correctable, (II-A) 6-23 
type codes, (II-C) 4-18 
unrecoverable reported, (II-C) 4-18 
Machine checks service routines 
entry point, (II-A) 6-26 
Magtape bootstrap image 
ANSI format, (IID 3-39 
boot blocked, (III) 3-40 
Major modes, (III) 3-3 


Major state transitions, (III) 3-2 
console rules for, (III) 3-2 
Major states, (III) 3-1 


MAP _F function, (I) 2-4 

MAP _S function, (I) 2-7 

MAP _x operator, (I) 3-8 

Mask byte instructions, (I) 4-57 

Masking, machine checks with, (II-A) 6-23 
MAX, defined for floating-point, (I) 4-65 
maxCPU, (II-B)1-2 

Maximum ASN value, HWRPB field for, (III) 2-7 
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MAXS(x,y) operator, (I) 3-8 
MAXSB8 instruction, (I) 4-155 
MAXSW4 instruction, (I) 4-155 
MAXU(x,y) operator, (I) 3-8 
MAXUB8 instruction, (I) 4-155 © 
MAXUW4 instruction, (I) 4-155 


MB (Memory barrier) instruction, (I) 4-143 
compared with WMB, (I) 4—150 
multiprocessors only, (I) 4—143 
with DMA I/O, (1) 5-22 
with LDx_L/STx_C, (1) 4-14 
with multiprocessor D-stream, (I) 5—22 
with shared data structures, (I) 5-9 
See also IMB, WMB 


MBZ (must be zero), (1) 1-9 
MCES. See Machine check error summary 


MCK bit, machine check error summary register, 
(I-A) 5-14, (II-C) 4-18 

MEMC. See Memory cluster descriptor 

MEMDSC. See Memory data descriptor table 


Memory access 

aligned byte/word, A-11 

coherency of, (I) 5-1 

granularity of, (I) 5-2 

width of, (I) 5-3 

with WMB instruction, (1) 4-149 
Memory alignment, requirement for, (I) 5-2 


Memory barrier instructions. See MB, IMB 
(PALcode), and WMB instructions 


Memory barriers, (I) 5-22 


Memory cluster descriptor (MEMC) table 
structure of, (III) 3~12 
Memory clusters, (III) 3-10 


Memory data descriptor (MEMDSC) table 
in HWRPB, (III) 2-11 
offset, HWRPB field for, (III) 2-8 
structure of, (IIT) 3—12 
with cold boot, (IIT) 3-10 


Memory format instructions 
opcodes and format summarized, C-—1 
Memory instruction format, (I) 3-11 


Memory jump instruction format, (I) 3-12 


Memory management, (II-C) 3-1 
address translation, (III-A) 3-8 
always enabled, (II-A) 3-3 
control of, (II-B) 3-3 
faults, (I-A) 3-13, (I-A) 6-10, (II-B) 3-12 
introduced, (II-A) 3-1 
page frame number (PFN), (II-A) 3-6 
page table entry (PTE), (I-A) 3-4 
protection code, (II-A) 3-7 
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protection of individual pages, (II-A) 3-7 
PTE modified by software, (II-A) 3-6 
support in PALcode, (I) 6-2 

translation buffer with, (II-A) 3-11 
unrecoverable error, (II-A) 6—22 

with interrupts, (II-A) 6-18 

with multiprocessors, (I-A) 3-6 

with process context, (II-A) 4-1 

See also Address translation 


Memory management exception 
(MEM_MGMT_ENTRY) register, (II-C) 
2-5 
Memory management fault entry (entMM) register, 
(II-B) 1-3, -B) 5-4, (II-B) 5-9 


Memory management faults 
registers used, (II-A) 6—10 
system entry for, (II-B) 5—4 
types, (II-B) 3-12 
with unaligned data, (II-A) 6-16 


Memory prefetch registers 
defined, (1) 3-3 

Memory protection, (II-B) 3-6 

Memory sizing at cold boot, (III) 3-10 

Memory-like behavior, (I) 5-3 

MF_FPCR instruction, (I) 4-109 

MFPR_IPR_name (PALcode) instruction, (I-A) 

2-86 

MIN, defined for floating-point, (I) 4-65 

MINS(x,y) operator, (I) 3-8 

MINSB8 instruction, (I) 4-155 

MINSW4 instruction, (1) 4—155 

MINU(x,y) operator, (I) 3-8 

MINUB8 instruction, (1) 4-155 

MINUW4 instruction, (1) 4—155 

MIP bit, machine check error summary register, 
(II-B) 5-8 

Miscellaneous instructions, (1) 4-133 

MMCSR, (II-B) 5-7 

MMCSR code, (II-B) 3—12 

MOP-based network bootstrapping, (III) 3-42 


Move instructions (conditional). See Conditional 
move instructions 


Move, register-to-register, A-14 
MSKBL instruction, (1) 4-57 
MSKLH instruction, (1) 4-57 
MSKLL instruction, (I) 4-57 
MSKQL instruction, (I) 4-57 
MSKWH instruction, (I) 4-57 
MSKWL instruction, (I) 4-57 
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MT_FPCR instruction, (I) 4-109 
synchronization requirement, (I) 4—82 
MTPR_IPR_name (PALcode) instruction, (II-A) 
2-87 
MULF instruction, (I) 4-127 
MULG instruction, (I) 4-127 


MULL instruction, (I) 4—34 
with MULQ, (I) 4-34 

MULQ instruction, (I) 4-35 
with MULL, (1) 4-34 
with UMULH, (1) 4-35 

MULS instruction, (I) 4-128 


MULT instruction, (I) 4-128 
Multimedia instructions, (I) 4-154 
Multiple instruction issue, A-3 


Multiply instructions 
multiply longword, (I) 4-34 
multiply quadword, (I) 4—35 
multiply unsigned quadward high, (I) 4-36 
See also Floating-point operate 
Multiprocessor bootstrapping, (IID) 3-23 
primary processor, (III) 3-23 
Multiprocessor environment 
booting, (III) 3-23 
cache coherency in, (I) 5-6 
console requirements, (III) 2-26 
context switching, (I) 5-24 
interprocessor interrupt, (II-A) 6-21 
I-stream reliability, (I) 5—23 
MB and WMB with, (I) 5-22 
memory faults, (II-A) 6-11 
memory management in, (II-A) 3-6 
move operations in, (II-A) 2-74 
no implied barriers, (1) 5-22 
read/write ordering, (I) 5-10 
serialization requirements in, (I) 4-143 
shared data, (I) 5-6, A—7 
Multithread implementation, (II-A) 2—79 


N 


NaN (Not-a-Number) 
conversion to integer, (I) 4-88 
copying, generating, propograting, (I) 4-89 
defined, (I) 2-6 
quiet, (I) 4-64 
signaling, (I) 4-64 
NATURALLY ALIGNED data objects, (I) 1-8 
Negate stylized code form, A-15 
Network bootstrapping, (III) 3-42 
New PALcode, (III) 3-5 
Next PC, (II-A) 6-2 
Non-finite number, (I) 4-64 
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Nonmapped address space, (II-C) 3-1 
Nonmemory-like behavior, (I) 5-3 

NOP, universal (UNOP), A-13 

NOT instruction, ORNOT with zero, (I) 4-42 
NOT operator, (I) 3-9 

NOT stylized code form, A-15 


O- 


Opcode qualifiers 
default values, (I) 4-3 
notation, (1) 4-3 
See also specific qualifiers 
Opcodes 


common architecture, C-—1 
DIGITAL UNIX PALcode, C-—16 


in numerical order, C—10 

Dpen VMS hioha PALeods. C14 
PALcode in numerical order, C-18 
reserved, C—21 

summary, C-8 

unused function codes for, C—21 
‘Windows NT Alpha PALcode, C-—17 
See also Function codes 


opDec, (II-B) 1-2 
OPEN device routine, (III) 2-51 
determines WRITE characteristics, (III) 2-56 
OpenVMS Alpha PALcode instructions (list), (II-A) 
2-1. 
OpenVMS Alpha PALcode, instruction summary, 
C-14 
Operand expressions, (I) 3-4 
Operand notation 
defined, (I) 3-4 
Operand values, (I) 3-4 
Onerate instruction format 
unused function codes with, (I) 3-13 
Operate instructions 
opcodes and format summarized, C-—1 
Operate instructions, convert with integer overflow, 
(I) 4-78 
Operator halted (OH) flag, (IID) 3-36 


at multiprocessor boot, (III) 3—23 
per-CPU state contains, (III) 2—22 


Operators, instruction format, (I) 3-6 
Optimization. See Performance optimizations 
OR operator, (1) 3-9 

ORNOT instruction, (1) 4-42 

OS Loader, (II-C) 1-2 


Overflow bit, exception summary register, (II-C) 
4-6 
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Overflow enable (OVFE) 
FP_C quadword bit, B-6 
Overflow status (OVFS) 
FP_C quadword bit, B-—5 
Overflow trap, (II-A) 6-14, (II-B) 5-5, (II-C) 4-6 


Overlap 
with location access constraints, (I) 5—14 
with processor issue constraints, (I) 5—13 
with visibility, (1) 5-14 

OVF bit 


exception summary parameter, (II-A) 6—13 
exception summary register, (II-B) 5—5, (II-C) 
4-6 


See also Arithmetic traps, overflow 
OVED bit. See Trap disable bits, overflow disable 


p 


Pack to Dyies insirucitions, Gi) 4-158 

Page directory base (PDR) register, (II-C) 2-6 
initializing, (II-C) 5-12 
maps PTEs, (II-C) 3-3 
with context switch, (II-C) 5-35 

Page directory entry (PDE), (II-C) 3-3 


Page frame number (PFN) 
bits in PTE, (II-A) 3-4, (II-B) 3-4, CII-C) 3-4 
determining validation, (II-A) 3-6 
finding for SCB, (I-A) 5-19 
in PTE, (II-C) 3-2 
PTBR register, (II-A) 5-18 
when a PDR, (II-C) 3-3 
with address translation, (II-A) 3-8 
with context switch, (II-C) 2-10, (II-C) 5-31 
with hardware context switching, (II-A) 4—-3 
with physical address translation, (II-B) 3-7 


Page size, HWRPB field for, (III) 2-6 
Page sizes, (II-B) 3-2 
Page table base (PTBR) register, (II-A) 5-18 
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at processor initialization, (III) 3-20 

defined, (II-B) 1-4 

in HWPCB, (II-A) 4—2 

in initial HWPCB, (III) 3-21 

in process context, (II-B) 4-1 

privileged context, (II-A) 2-91 

with address translation, (II-A) 3-8 

with PALcode switching, (III) 3-8 

with physical address translation, (II-B) 3-7 
Page table entry (PTE), (II-A) 3-4 

after software changes, (II-A) 3-11 

atomic modification of, (II-A) 3-6 

bits, summarized, (II-B) 3-4 

calculating at cold boot, (II) 3-16 

changing, (II-B) 3-5 

changing and managing, (II-B) 3-5 

format of, (II-B) 3-4 

modified by software, (II-A) 3-6 

page frame number (PFN) with, (I-C) 3-2 
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page protection, (II-A) 3-7 
summary of, (II-C) 3-4 
virtual access of, (II-B) 3-9 
with multiprocessors, (II-A) 3-6 
Page table space 
loading at cold boot, (III) 3-14 
Page tables 
calculating base, (IIT) 3-16 
initial mapping at cold boot, (IIT) 3-16 
physical traversal algorithm, (II-C) 3-3 
traversing, (II-C) 3-3 
Pages 
collecting statistics on, (II-A) 6-11 
individual protection of, (II-A) 3-7 
max address size from, (II-A) 3-3 
possible sizes for, (II-A) 3-2 
size range of, (II-B) 3-1 
virtual address space from, (II-A) 3-2 
PAGES, CRB field for, (III) 2-70 


pageSize, (II-B) 1-2 
PALcode 


access to kernel stack, (IJ-A) 6-30 
argument registers used, (II-C) 5-1 
barriers with, (I) 5—22 

CALL_PAL instruction, (I) 4-136 
compared to hardware instructions, (I) 6-1 
current defined, (III) 3-5 

debugging, (II-C) 5-54 

DIGITAL UNIX support for, (1I-B) 5-9 
event counters during debug, (II-C) 5-55 
identifying the image, (III) 3-5 

illegal operand trap, (II-A) 6-16 
implementation-specific, (I) 6-2 

initial processor context for, (II-C) 6-2 
initialization of, (III) 3—4 

initializing environment for, (II-C) 6-1 
instead of microcode, (1) 6—1 

instruction format, (I) 3-14 

internal software registers, (II-C) 5-15 
kernel activates, (II-C) 1-2 

loading, (IIT) 3-4 

loading at multiprocessor boot, (III) 3-23 
memory management requirements, (II-A) 3-3 
new defined, (III) 3-5 

OpenVMS Alpha, defined for, (II-A) 2-1 
OS Loader and, (II-C) 1-2 

overview, (I) 6-1 

processor state transitions, (II-A) 6-37 
queue data type support, (II-A) 2-21 
recognized instructions, (I) 6-4 

replacing, (I) 6-3 

required, (I) 6-3 

required instructions, (I) 6-5 

running environment, (I) 6—2 

special functions function support, (I) 6-3 
swapping currently executing, (II-C) 5-34 
switching, (II-B) 2-22, (IIT) 3-5 
switching at multiprocessor boot, (III) 3-24 
unexpected exceptions in, (II-C) 4-11 
variants at loading, (III) 3-4 

variants at multiprocessor boot, (IIT) 3-24 
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variants at processor initialization, (III) 3-21 
version control, (II-C) 2-7 
See also Queues, support for 


PALcode available, per-CPU slot field for, (IID) 


2-20 


PALcode image base address (PAL_BASE) register, 


(II-C) 2-5 
from initpal, (I-C) 5-12 
previous, (II-C) 6-4 
structure of, (II-C) 6-3 


PALcode instructions 


DIGITAL UNIX privileged (list), (II-B) 2-10 

DIGITAL UNIX unprivileged (list), (II-B) 2-1 

opcodes and format summarized, C—1 

OpenVMS Alpha (list), (II-A) 2-1 

OpenVMS Alpha privileged (list), (I-A) 2-82 

OpenVMS Alpha unprivileged (list), (I-A) 2-3 

required, C—20 

reserved, function codes for, C—20 

VAX compatibility, (II-A) 2-74 

Windows NT Alpha privileged (list), (II-C) 5-2 

Windows NT Alpha unprivileged (list), (II-C) 
5-45 


PALcode instructions, DIGITAL UNIX privileged 


cache flush, (II-B) 2-11 

console service, (II-B) 2-12 

performance monitoring function, (II-B) 2-31 

read machine check error summary, (II-B) 2-13 

read processor status, (II-B) 2-14 

read system value, (II-B) 2-16 

read user stack pointer, (II-B) 2—15 

return from system call, (II-B) 2-17 

return from trap, fault, or interrupt, (II-B) 2-18 

swap IPL, (II-B) 2-21 . 

swap PALcode image, (II-B) 2—22 

swap process context, (II-B) 2-19 

TB (translation buffer) invalidate, (II-B) 2—24 

wait for interrupt, (II-B) 2-35 

who am I, (II-B) 2-25 

write floating-point enable, (II-B) 2-27 

write interprocessor interrupt request, (II-B) 

write kernel global pointer, (II-B) 2—29 

write machine check error summary, (II-B) 
2-30 

write system entry address, (II-B) 2—26 

write system value, (II-B) 2-33 

write user stack pointer, (II-B) 2—32 

write virtual page table pointer, (II-B) 2-34 


PALcode instructions, DIGITAL UNIX unprivileged 


breakpoint, (II-B) 2—2 

bugcheck, (II-B) 2-3 

clear floating-point enable, (II-B) 2-5 
generate trap, (II-B) 2-6 

read unique value, (II-B) 2-7 

system call, (II-B) 2—4 

write unique value, (II-B) 2—8, (II-B) 2-9 


PALcode instructions, OpenVMS Alpha privileged 


cache flush, (II-A) 2—83 
console service, (II-A) 2-84 
load quadword physical, (I-A) 2—85 
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move from processor register, (II-A) 2-86 

move to processor register, (II-A) 2-87 

store quadword physical, (II-A) 2-88 

swap PALcode image, (II-A) 2-92 

swap privileged context, (II-A) 2—89 
PALcode instructions, OpenVMS Alpha unprivileged 

breakpoint, (II-A) 2-4 

bugcheck, (II-A) 2-5 

change to executive mode, (II-A) 2-6 

change to kernel mode, (II-A) 2—7 

change to supervisor mode, (II-A) 2-8 

change to user mode, (II-A) 2-9 

clear floating-point trap, (I-A) 2-10 

generate software trap, (II-A) 2-11 

insert into queue (list), (II-A) 2-29 

probe for read access, (II-A) 2-12 

probe for write access, (II-A) 2-12 

read processor status, (II-A) 2-14 

read system cycle counter, (IJ-A) 2-17 

read unique context, (II-A) 2—80 

return from exception or interrupt, (II-A) 2—15 

swap AST enable, (li-A) 2-19 

thread, (I-A) 2-79 

write PS software field, (II-A) 2—20 

write unique context, (II-A) 2-81 


PALcode instructions, required privileged, (I) 6-5 
PALcode instructions, required unprivileged, (1) 6-5 


PALcode instructions, Windows NT Alpha privileged 


clear software interrupt request, (II-C) 5-4 

data TB invalidate single, (II-C) 5-8 

disable alignment fixups, (II-C) 5-5, (II-C) 5-9 

disable all interrupts, (II-C) 5-6 

drain all aborts, (II-C) 5-7 

enable alignment fixups, (JI-C) 5-9 

enable interrupts, (II-C) 5-10 

halt operating system, (II-C) 5-11 

initialize PALcode data structures, (II-C) 5-12, 
(II-C) 5-14 

initialize processor control region data, (II-C) 

read current IRQL, (II-C) 5-16 

read initial kernel stack pointer, (II-C) 5-17 

read internal processor state, (II-C) 5-21 

read machine check error summary register, 

(II-C) 5-18 
read processor (PSR) status register, (II-C) 5-20 
read processor control region base address, 
(II-C) 5-19 

read software event counters, (II-C) 5-15 

read thread value, (II-C) 5—22 = 

restart operating system, (II-C) 5-24 

return from exception or interrupt, (II-C) 5-27 

return from system service call exception, (II-C) 
5-25 

set software interrupt request, (II-C) 5—29 

swap current IRQL, (II-C) 5-32 

swap current PALcode, (II-C) 5-34 

swap initial kernel stack pointer, (II-C) 5-33 

swap process context, (II-C) 5-35 

swap thread context, (II-C) 5-30 

transfer to console firmware, (II-C) 5-23 

translation buffer invalidate all, (II-C) 5-36 
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translation buffer invalidate multiple, (II-C) 
5-37 

translation buffer invalidate multiple for ASN, 
(II-C) 5-38 

translation buffer invalidate single, (II-C) 5-39 

translation buffer invalidate single for ASN, 
(II-C) 540 

write kernel exception entry routine, (II-C) 5-41 

write machine check error summary register, 
(II-C) 5-43 

write performance monitor, (II-C) 5-44 

PALcode instructions, Windows NT Alpha 
unprivileged 

breakpoint trap, (II-C) 5-46 

call kernel debugger, (II-C) 5-47 

generate a trap, (II-C) 5-50 

instruction memory barrier, (II-C) 5-51 

kernel breakpoint trap, (II-C) 5-52 

read TEB pointer, (II-C) 5-53 

system service call, (II-C) 5—48 


PAL code loaded (PL) flag, (IT) 3-4 
at multiprocessor boot, (III) 3-23 
per-CPU state contains, (III) 2-21 
PALcode loading at bootstrap, (II) 3-13 


PALcode memory space 


length of, (III) 2-17 
physical address of, (II) 2-17 
with PALcode loading, (III) 3-4 


PALcode memory valid (PMV) flag’ 


at multiprocessor boot, (III) 3-23 
per-CPU state contains, (III) 2-22 
with PALcode loading, (II) 3-4 


PALcode opcodes in numerical order, C—18 
PALcode revision, per-CPU slot field for, CID 2-17 
with PALcode switching, (III) 3-6 
PALcode scratch space 
length of, (III) 2~17 


physical address of, (III) 2—17 
with PALcode loading, (III) 3-4 


DAT ende erratch valiua 
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in initial HWPCB, (IIT) 3-21 
PALcode swapping, (II-A) 2-92 


PALcode valid (PV) flag 


at multiprocessor boot, (III) 3-23 
per-CPU state contains, (III) 2-22 
with PALcode loading, (III) 3-4 


PALcode variation 2, (III) 3-8 
PALcode variation assignments, D-2 
Panic exception (PANIC_ENTRY) register, (II-C) 
2-5 
Panic exceptions, (ii-C) 4-10 
kernel stack under/overflow, (II-C) 5—54 
trap from and dispatch for, (II-C) 4-11 


Panic stack, (II-C) 2-9 
Panic stack pointer, (II-C) 2-7 
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PANIC_STACK_SWITCH code, (II-C) 4-10 
Passive release interrupts, (II-A) 6~—20 

entry point, (II-A) 6-29 
PASSIVE_LEVEL, IRQL table index name, (II-C) 

2-2 
PC halted, per-CPU slot fields for, (III) 2-19 
PC. See Program counter 
PCB. See Process control block 
PCBB. See Process control block base 
PCC_CNT, (1) 3-3, (D) 4-144 | 
PCC_OFF, (I) 3-3, (1) 4-144 
PCE bit, machine check error summary register, 
(I-A) 5-14, (II-B) 5-8, (II-C) 4-18 

Per-CPU slots 

block for, (IID) 2—10 

fields for, (III) 2-17 

in HWRPB, (IID) 2-15 

number, HWRPB field for, (III) 2-8 

size, HWRPB field for, (III) 2-8 

state flags at multiprocessor boot, (III) 3—23 

state flags in, (II) 2-21 

with PALcode switching, (III) 3—7 
Performance monitor (PME) register 

privileged context, (II-A) 2-91 
Performance monitor interrupt entry point, (II-A) 

6-29 

Performance monitoring, E-3, E-9 


Performance monitoring enable (PME) bit 
defined, (II-B) 1-4 
in HWPCB, (II-A) 4-2 
in process context, (II-B) 4-1 
Performance monitoring register (PERFMON), 
(I-A) 5-15 
writing, (II-B) 2-31 
Performance optimizations 
branch prediction, A-3 
code sequences, A—10 
data stream, A-6 
for I-streams, A—2 
instruction alignment, A-—2 
instruction scheduling, A-—5 
I-stream density, A-—5 
multiple instruction issue, A-3 
shared data, A-7 
Performance tuning 
IMPLVER instruction with, (I) 4-142 


PERR (Pixel error) instruction, (1) 4—157 
PFN. See Page frame number 
Physical address size, HWRPB field for, (III) 2-6 


Physical address space, (II-A) 3-3, (II-B) 3-3, 
(II-C) 3-2 
described, (I) 5-1 
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Physical address translation, (II-A) 3-9, (II-B) 3-7, 
(I-C) 3-2 

PHYSICAL_ADDRESS operator, (I) 3-9 

Pipelined implementations, using EXCB instruction 
with, (I) 4-139 

Pixel error instruction, (I) 4-157 

PKLB (Pack longwords to bytes) instruction, (1) 
4-158 

PKWB (Pack words to bytes) instruction, (I) 4-158 

PME. See Performance monitoring enable 

PMI bus, uncorrected protocol errors, (IJ-A) 6-22 


Powerfail and recovery 


multiprocessor type of, (III) 3-29 
split type of, (III) 3-29 
uniprocessor type of, (III) 3-28 
united type of, (III) 3-29 


Powerfail interrupt, (II-A) 6-21 

service routine entry point, (II-A) 6-29 
Powerfail restart (PR) flag 

powerfail and recovery, (III) 3-29 
Powerfail, CFLUSH PALcode instruction with, 

(I-A) 6-22 

Power-up initialization, (IIT) 3-3 
Prefetch data (FETCH instruction), (1) 4-140 
Pre-PALcode initialization, (II-C) 6-1 


Primary bootstrap image 


format of, (III) 3-36 
loading at cold, (IID) 3-14 


Primary processor 


at multiprocessor boot, (III) 3-23 

definition of, (III) 1-1 

modes for, (II) 3-3 

running at multiprocessor boot, (IIT) 3-25 
switching from, (III) 3-31 
Primary-eligible (PE) bit 

at multiprocessor boot, (II) 3—23 

with BB_WATCH, (IID) 3-44 

with console switching, (II) 3-31 


PRIORITY_ENCODE operator, (I) 3-9 
Privileged Architecture Library. See PALcode 
Privileged context, (IJ-A) 2-91 


Privileged context block base (PCBB) register, 
(II-A) 5-16 
at processor initialization, (II) 3—20 
with PALcode switching, (IID) 3-8 


Privileges, processor, (II-C) 2-2 

PROBER (PALcode) instruction, (II-A) 2-12 
PROBEW (PALcode) instruction, (II-A) 2-12 
Process, (II-A) 4-1 

Process context, (II-B) 4-1 
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saved in PCB, (II-B) 4—2 

Process control block (PCB), (II-B) 4—2 
structure, (II-B) 4—2 

Process control block (PCB) register, (II-B) 1-3 


Process control block base (PCBB) register, (II-B) 
1-3 
‘ Process control region base (PCR) register, (II-C) 
2-5 
Process unique value (unique) register, (II-B) 1-4 
in process context, (II-B) 4-1 
PROCESS_KEYCODE console terminal routine, 
(IT) 2~38 
Processor 
adding to running system, (III) 3-26 
states and modes, (III) 3-1 
Processor available (PA) flag 
at multiprocessor boot, (II) 3-23 
per-CPU state contains, (iii) 2-22 
Processor base (PRBR) register, (I-A) 5-17 
Processor communication, (I) 5-15 


Processor control block (PRCB) 
at initialization, (II-C) 6-2 
Processor control region, (II-C) 2-7 
interrupt tables with, (II-C) 2-7 
Processor control region base (PCR) register 
at initialization, (II-C) 6-2 
initializing, (II-C) 5-12 
returning contents of, (II-C) 5-19 
Processor correctable errors, (II-C) 4-16 
reporting, (II-C) 4-18 
Processor cycle counter (PCC) register, (I) 3: 3 


for Digital UNIX, (II-B) 1-4 

for OpenVMS Alpha, (II-A) 1-2 

in initial HWPCR, (IT) 3-21 

RPCC instruction ‘with, (1) 4-144 
system cycle counter with, (I-A) 2-17 


See also Charged process cycles 
Processor data areas, (II-C) 2—7 


Processor hardware interrupt, service routine entry 
points, (II-A) 6-29 

Processor initialization, (III) 3-20 

Processor issue constraints, (1) 5-12 

Processor issue sequence, (I) 5-12 

Processor modes, (II-A) 3-1, (II-C) 2—1, (ID 3-3 
AST pending state, (II-A) 5-7 
change to executive, (II-A) 2-6 
change to kernel, (II-A) 2-7 
change to supervisor, (I-A) 2-8 
change to user, (II-A) 2-9 
controlling memory access, (II-A) 3-7 
enabling executive mode reads, (II-A) 3-5 
enabling executive mode writes, (II-A) 3-4 
enabling kernel mode reads, (I-A) 3-5 


Master Index—22 


enabling supervisor mode reads, (II-A) 3-5 

enabling supervisor mode writes, (II-A) 3-4 

enabling user mode reads, (II-A) 3-4 

enabling user mode writes, (II-A) 3-4 

page access with, (II-A) 3-2 

PALcode state transitions, (II-A) 6-37 
Processor present (PP) flag 


at multiprocessor boot, (III) 3-23 
per-CPU state contains, (III) 2—22 


Processor stacks, (II-A) 6-7 

Processor state transitions, (II-A) 6—37 
Processor state, defined, (II-A) 6—5 

Processor state, internal, initialized, (II-C) 6-1 


Processor status (PS) register 
at processor initialization, (III) 3-20 
bit meanings for, (II-B) 5-2 
bit summary, (II-A) 6-6 
bootstrap values in, (II-A) 6-6 
current, (II-A) 6—5 
defined, (I-A) 1-1, (II-B) 1-4 
explicit reading/writing of, (II-A) 6-5 
in process context, (II-B) 4-1 
in processor state, (II-A) 6-5 
saved on stack, (IIJ-A) 6-5 
‘saved on stack frame, (II-A) 6—7 
with PALcode switching, (III) 3-8 
WR_PS_SW instruction, (I-A) 2-20 


Processor status (PSR) register, (II-C) 2-1, (II-C) 
2-6 
returning contents of, (II-C) 5-20 
Processor type assignments, D-1 


Processor uncorrectable errors, (II-C) 4-17 
Processor unique value, (III) 3-8 


Processor unique value (unique) register 
in initial HWPCB, (II) 3-21 
with PALcode switching, (III) 3-8 
Processor, per-CPU slot field for 
halt, (IIT) 2-19 
revision, (III) 2-18 
serial number, (III) 2-18 
_ software compatibility, (III) 2-20 
type, (III) 2-18 
variation, (III) 2-18 
Processors, switching primary, (III) 2-64 


‘Program counter (PC) register, (I) 3-1 


alignment, (II-A) 6-6 

current PC defined, (IJ-A) 6-2 
defined, (II-B) 1-3 

explicit reading of, (II-A) 6-7 

in process context, (II-B) 4-1 

in processor state, (II-A) 6-5 
saved on stack frame, (II-A) 6-7 
with arithmetic traps, (I-A) 6-14, (II-B) 5-1 
with EXCB instruction, (I) 4-139 
with faults, (II-A).6—9 

with interrupts, (II-A) 6-2 

with machine checks, (II-A) 6—23 
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with PALcode switching, (III) 3-8 
with synchronous traps, (II-A) 6-15 


Program I/O mode, (IID) 3-3 

Protection code, (II-A) 3-7, (II-B) 3-6 
Protection modes, (II-A) 6-7 

PS. See Processor status 

PS<SP_ALIGN?> field, (II-A) 2-14 
Pseudo-ops, A-16 

PSR. See Processor status register 
PSWITCH console routine, (III) 2-64, (III) 3-32 
PTBR. See Page table base 

PTE. See Page table entry 

PUTS console terminal routine, (IIT) 2—40 


Q 


Quadword data type, (I) 2-2 
alignment of, (I) 2—3, (1) 2-12 
atomic access of, (I) 5-2 
integer floating-point format, (I) 2~12 
loading in physical memory, (II-A) 2-85 
storing to physical memory, (II-A) 2-88 
T_floating with, (I) 2-12 

Queues, support for 


absolute longword, (II-A) 2-21 
absolute quadword, (II-A) 2-24 
PALcode instructions (list), (I-A) 2—29 
self-relative longword, (II-A) 2~—21 
self-relative quadword, (II-A) 2—25 


R 


R31 
restrictions, (I) 3-1 
with arithmetic traps, (II-A) 6-12 


RAZ (read as zero), (I) 1-9 

RC (read and clear) instruction, (I) 4-153 
RD_PS (PALcode) instruction, (II-A) 2-14 
rdcounters (PALcode) instruction, (II-C) 5-15 
rdirql (PALcode) instruction, (II-C) 5—16 


rdksp (PALcode) instruction, (II-C) 5-17 


reads IKSP register, (II-C) 2-4 
reads kernel stack, (II-C) 2-9 


rdmces (PALcode) instruction, (II-B) 2—13, (II-C) 
5-18 
rdpcr (PALcode) instruction, (II-C) 5-19 
reads PCR register, (II-C) 2—5 
rdps (PALcode) instruction, (II-B) 2-14 
rdpsr (PALcode) instruction, (II-C) 5-20 
rdstate (PALcode) instruction, (II-C) 5-21 
rdteb (PALcode) instruction, (II-C) 5—53 
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reads TEB register, (II-C) 2-6 
rdthread (PALcode) instruction, (II-C) 5—22 

reads THREAD register, (II-C) 2-7 
RDUNIQUE (PALcode) instruction 

required recognition of, (I) 6-4 
rdunique (PALcode) instruction, (II-B) 2-7 
rdusp (PALcode) instruction, (II-B) 2-15 
rdval (PALcode) instruction, (II-B) 2-16 
READ device routine, (II) 2—53 


Read/write ordering (multiprocessor), (1) 5-10 
determining requirements, (I) 5—10 
hardware implications for, (I) 5-29 
memory location defined, (1) 5-11 

Read/write, sequential, A—-9 

READ_UNQ (PALcode) instruction, (II-A) 2—80 


Reason-for-halt code 
at power-up initialization, (II) 3-4 
reboot (PALcode) instruction, (II-C) 5-23 
operation of, (II-C) 6-3 
tasks and sequence for, (II-C) 6-5 
Regions in physical address space, (I) 5-1 
Regions, bootstrap address space, (III) 3-13 
Register mask, floating-point and integer, (II-C) 4-5 


Register write mask, with arithmetic traps, (II-A) 
6-13 
Registers, (I) 3-1 
DIGITAL UNIX usage, (II-B) 1-1 
floating-point, (I) 3-2 
integer, (I) 3-1 
lock, (I) 3-2 
memory prefetch, (I) 3-3 
OpenVMS Alpha usage of, (I-A) 1-1 
optional, (I) 3-3 
processor cycle counter, (I) 3-3 
program counter (PC), (1) 3-1 
value when unused, (I) 3-10 
VAX compatibility, (I) 3-3 
Windows NT Alpha usage of, (II-C) 1-3 
with IPRs, (II-A) 5-1 
See also specific registers 
Register-to-register move, A-14 


REI (PALcode) instruction, (II-A) 2—15 
arithmetic traps, (II-A) 6-9 
faults, (I-A) 6-9 
interrupt arbitration, (IJ-A) 6—36 
interrupts, (II-A) 6-2 
machine checks, (I-A) 6—23 
synchronous traps, (II-A) 6-15 
Relational Operators, (I) 3-9 


Remove from queue PALcode instructions 
longword, (II-A) 2—70 
longword at head interlocked, (II-A) 2—50 
longword at head interlocked resident, (II-A) 
2-53 
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longword at tail interlocked, (II-A) 2-60 
longword at tail interlocked resident, (II-A) 


quadword, (II-A) 2-72 

quadword at head interlocked, (II-A) 2-55 

quadword at head interlocked resident, (I-A) 
2-58 

quadword at tail interlocked, (II-A) 2-65 

quadword at tail interlocked resident, (II-A) 
2-68 


REMQHIL (PALcode) instruction, (II-A) 2-50 
REMQHILR (PALcode) instruction, (II-A) 2—53 
REMQHIQ (PALcode) instruction, (II-A) 2-55 
REMQHIQR (PALcode) instruction, (II-A) 2-58 
REMOTIL (PALcode) instruction, (IJ-A) 2—60 
REMOTILR (PALcode) instruction, (II-A) 2-63 
REMQTIQ (PALcode) instruction, (II-A) 2-65 
REMOTIOR (PALcode) instruction, (II-A) 2-68 
REMQUEL (PALcode) instruction, (II-A) 2-70 
REMQUEL/D (PALcode) instruction, (II-A) 2—70 
REMQUEQ (PALcode) instruction, (II-A) 2-72 
REMQUEQ/D (PALcode) instruction, (II-A) 2-72 
Representative result, (1) 4-64 

Reserved instructions, opcodes for, C—21 
RESET_ENV variable routine, (III) 2—59 
RESET_TERM console terminal routine, (III) 2-42 


restart (PALcode) instruction, (II-C) 5-24 
tasks and sequence for, (II-C) 6-5 - 
Restart block 
with catastrophic errors, (II-C) 4-19 
Restart block pointer, (II-C) 2-7, (II-C) 6-3 
Restart execution address (RESTART_ADDRESS) 
register, (II-C) 2-6 
at PALcode exit, (II-C) 5-1 
RESTART RTN VA, HWRPB field for, (III) 2-9 


RESTART value, HWRPB field for, (III) 2-9 


Restart-capable (RC) flag 
at multiprocessor boot, (III) 3-23 
at processor initialization, (III) 3-20 
per-CPU state contains, (II) 2-22 
state transitions and, (III) 3-1 
with failed bootstrap, (IID) 3-18 
with secondary console, (III) 3-26 
RESTORE_TERM console routine, (III) 3-34, (II) 
3-35 
RESTORE_TERM RTN VA, HWRPB fieid for, 
(IIT) 2-9 
RESTORE_TERM value, HWRPB field for, (III) 
2-9 


Result latency, A—5 
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RET instruction, (I) 4-22 
retsys (PALcode) instruction, (II-B) 2-17, (II-C) 
5-25 
PS with, (II-B) 5-2 
use of, (II-C) 4-2 
Revision, HWRPB field for, (III) 2-6 
rfe (PALcode) instruction, (II-C) 5-27 


compared to retsys, (II-C) 5—25 
use of, (II-C) 4-2 


RIGHT_SHIFT(x,y) operator, (I) 3-9 
ROM boot block structure, (III) 3-41 
ROM bootstrapping, (III) 3-41 
Rounding modes. See Floating-point rounding modes 
RPCC (read processor cycle counter) instruction, (I) 
4-144 
RSCC instruction with, (I-A) 2-18 
RS (read and set) instruction, (I) 4-153 
RSCC (PALcode) instruction, (II-A) 2-17 
RPCC instruction with, (IJ-A) 2-18 
rti (PALcode) instruction, (II-B) 2-18 


PS with, (II-B) 5-2 
with exceptions, (II-B) 5-1 


RX BUFFER, field in RXTX buffer area, (III) 2-77 
RXLEN, field in RXTX buffer area, (IID) 2-77 
RXRDY bitmask, HWRPB field for, (III) 2-10 


RXRDY flag, (ID) 2-75 
at multiprocessor boot, (III) 3-23 
RXTX buffer area, (III) 2-76 
per-CPU slot field for, (II) 2-20 


Ss 


Q flanting Aata tema 
S_ Oates Gata vype 


alignment of, (I) 2-8 

compared to F_floating, (I) 2-8 
exceptions, (I) 2-8 

mapping, (1) 2-7 

MAX/MIN, (I) 4-65 

NaN with T_floating convert, (I) 4-88 
operations, (I) 4-62 

when data is unaligned, (II-A) 6-28 


S4ADDL instruction, (I) 4-26 
S4ADDQ instruction, (I) 4-28 
S4SUBL instruction, (1) 4-38 
S4SUBQ instruction, (I) 4-40 
S8ADDL instruction, (I) 4—26 
S8ADDQ instruction, (I) 4-28 
S8SUBL instruction, (I) 4-38 
S8SUBQ instruction, (1) 440 
SAVE_ENV variable routine, (III) 2-60 
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SAVE_TERM console routine, (III) 3-34, (III) 3-35 

~SAVE_TERM RTN VA, HWRPB field for, (III) 
2-9 

SAVE_TERM value, HWRPB field for, (III) 2-9 

SBZ (should be zero), (I) 1-9 

SCC. See System cycle counter 

SCE bit, machine check error summary register, 

(I-A) 5-14, (II-B) 5-8, (II-C) 4-18 
Secondary processors 


at multiprocessor boot, (III) 3-23 
definition of, (III) 1-1 
modes for, (III) 3-3 


Security holes, (I) 1-7 

with UNPREDICTABLE results, (I) 1-8 
Seg0, mapping of, (II-B) 3-1 
Segl, mapping of, (II-B) 3-1 
Self-relative longword queue, (II-A) 2-21 
Self-relative quadword queue, (II-A) 2—25 
Sequential read/write, A-—9 
Serialization, MB instruction with, (I) 4-143 
SET_ENV variable routine, (III) 2-62 
SET_TERM_CTL terminal console routine, (III) 

2-43 
SET_TERM_INT console terminal routine, (IID) 
244 

SEXT(x) operator, (I) 3-9 
Shared data (multiprocessor), A-—7 

changed vs. updated datum, (1) 5-6 
Shared data structures 

atomic update, (I) 5-7 

ordering considerations, (I) 5—9 

using memory barrier (MB) instruction, (I) 5-9 
Shared memory 

accessing, (I) 5-11 

defined, (1) 5-10 
Shift arithmetic instructions, (I) 4—46 
Sign extend instructions, (I) 4-60 
Single-precision floating-point, (I) 4-62 
SLL instruction, (I) 4-45 
Software (SW) field, in PS register, (II-A) 6-6 
Software completion bit, exception summary register, 

(I-A) 6-13, (II-B) 5-6, (II-C) 4-6 

Software considerations, A—1 

See also Performance optimizations 
Software exceptions, (II-C) 4-8 


Software interrupt request (SIRR) register, (II-C) 
2-6 
clearing, (II-C) 5-4 
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described, (IJ-A) 5-20 
format for, (II-C) 4-15 
interrupt arbitration, (II-A) 6—35, (II-A) 6-36 
protocol for, (II-A) 6-19 
See also Software interrupts 
Software interrupt summary (SISR) register 
at processor initialization, (III) 3—20 
described, (II-A) 5-21 
protocol for, (II-A) 6-19 
Software interrupts, (II-A) 6-19 


asynchronous system traps (AST), (II-A) 6-20 
protocol between summary and request, (II-A) 


recording pending state of, (II-A) 5-21 

request (SIRR) register, (II-A) 6-19 

requesting, (II-A) 5—20, (II-C) 4—15 

requests after exception handling, (II-C) 5-25, 
(II-C) 5-27 

service routine entry points, (II-A) 6-29 

setting, (II-C) 5-29 

summary (SISR) register, (II-A) 6-19 

supported levels of, (II-A) 5—20 


Software traps, generating, (II-A) 2-11 
SP. See Stack pointer 

SQRTF instruction, (I) 4-129 

SQRTG instruction, (1) 4-129 

SQRTS instruction, (I) 4-130 

SQRTT instruction, (1) 4-130 


Square root instructions 
TEEE, (1) 4-130 
VAX, (I) 4-129 
SRA instruction, (I) 4-46 
SRL instruction, (I) 445 
ssir (PALcode) instruction, (IJ-C) 5-29 
sets software interrupts, (II-C) 4-16 
Stack alignment, (II-A) 6-31 
Stack alignment (SP_ALIGN), field in saved PS, 
(II-A) 6-6 
Stack frames, (II-A) 6-7, (II-B) 5-3 
Stack pointer (SP) register 
defined, (I-A) 1-1, (II-B) 1-4 
linkage for, (II-B) 1-1 
State flags, per-CPU slot field for, (III) 2—17 
STATUS_ALPHA_ARITHMETIC code, (II-C) 4-5 


STATUS_ALPHA_GENTRAP code, (II-C) 4-8 


STATUS_BREAKPOINT code, (II-C) 4-9 


STATUS_DATATYPE_MISALIGNMENT code, 
(II-C) 4-7 
STATUS_ILLEGAL_INSTRUCTION code, (I-C) 
4—7 
STATUS_INVALID_ADDRESS code, (II-C) 4-7 
STB instruction, (I) 4—15 
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STF instruction, (I) 4-95 
when data is unaligned, (II-A) 6-28 
STG instruction, (I) 4-96 
when data is unaligned, (II-A) 6-28 
STL instruction, (I) 4-15 
when data is unaligned, (II-A) 6-28 
STL_C instruction, (I) 4-12 
when data is unaligned, (II-A) 6-28 
when guaranteed ordering with LDL_L, (1) 
4-14 
with LDx_L instruction, (I) 4-12 
with processor lock register/flag, (I) 4-12 
Storage, defined, (I) 5-14 


Store instructions 


emulation of, (I) 4-3 

FETCH instruction, (I) 4-140 
multiprocessor environment, (1) 5-6 
serialization, (I) 4-143 

Store byte, (1) 4-15 

store longword, (I) 4-15 

store longword conditional, (I) 4-12 
store quadword, (1) 4-15 

store quadword conditional, (I) 4-12 
Store word, (I) 4-15 

STQ_U, () 4-17 

when data is unaligned, (II-A) 6—28 
See also Floating-point store instructions 


Store memory integer instructions, (1) 4—4 
STORE_CONDITIONAL operator, (I) 3-9 
Store-conditional, defined, (I) 5-16 
STQ instruction, (I) 4—15 

when data is unaligned, (IJ-A) 6-28 
STQ_C instruction, (I) 4-12 

when data is unaligned, (II-A) 6-28 

when guaranteed ordering with LDQ_L, (1) 

4-14 

with LDx_L instruction, (1) 4-12 

with processor lock register/flag, (I) 4-12 
STQ_U instruction, (1) 4-17 
STQP (PALcode) instruction, (Ii-A) 2-88 
STS instruction, (I) 4-97 


when data is unaligned, (II-A) 6-28 
with FPCR, (I) 4-84 


STT instruction, (1) 4-98 
when data is unaligned, (II-A) 6-28 
STW instruction, (I) 4-15 
SUBEF instruction, (I) 4-131 
SUBG instruction, (I) 4-131 
SUBL instruction, (1) 4-37 
SUBQ instruction, (I) 4-39 
SUBS instruction, (1) 4-132 
SUBT instruction, (1) 4-132 


Subtract instructions 
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subtract longword, (I) 4-37 
subtract quadword, (I) 4-39 
subtract scaled longword, (I) 4-38 
subtract scaled quadword, (1) 4-40 
See also Floating-point operate 


SUM bit. See Summary bit 

Summary bit, in FPCR, (I) 4-80 

Superpage address space, (II-C) 3-1 

Supervisor read enable (SRE), bitin PTE, (II-A) 3-5 


Supervisor stack pointer (SSP) register, (II-A) 5-22 
as internal processor register, (II-A) 5-1 
in HWPCB, (II-A) 4-2 
in initial HWPCB, (III) 3-21 
Supervisor write enable (SWE), bit in PTE, (II-A) 
3—4 


SWASTEN (PALcode) instruction, (II-A) 2-19 
interrupt arbitration, (II-A) 6—36 
with ASTEN register, (II-A) 5-6 
SWC bit 
exception summary parameter, (II-A) 6-13 
exception summary register, (II-B) 5-2, (II-B) 
5-6, (II-C) 4-6 
SWPCTX (PALcode) instruction, (II-A) 2-89 
with ASTSR register, (II-A) 5-8 
swpctx (PALcode) instruction, (II-B) 2~19, (II-C) 
5-30 
PCB with, (II-B) 4-2 
PDR register with, (II-C) 2-6 
with ASNs, (II-B) 3-10 
writes IKSP register, (II-C) 2—4 
writes TEB register, (II-C) 2-6 
writes THREAD register, (II-C) 2-7 
swpipl (PALcode) instruction, (II-B) 2—21 
PS with, (II-B) 5-2 
swpirql (PALcode) instruction, (II-C) 5-32 
as synchronization function, (II-C) 4-15 
swpksp (PALcode) instruction, (II-C) 5-33 
reads kernel stack, (II-C) 2-9 
writes IKSP register, (II-C) 2-4 
SWPPAL (PALcode) instruction, (II-A) 2—92 
required recognition of, (I) 64 
with PALcode switching, (III) 3-6 
swppal (PALcode) instruction, (II-B) 2-22, (II-C) 
5-34, (II-C) 6-6 
firmware contributes, (II-C) 1-2 
required recognition of, (I) 64 
swpprocess (PALcode) instruction, (II-C) 5-35 
writes PDR register, (II-C) 2-6 
Synchronization levels, interrupt, (II-C) 4-13 
Synchronous traps, (I-A) 6-9, (II-B) 5-2 
data alignment, (IJ-A) 6-15 
defined, (II-A) 6-9 
program counter (PC) value, (II-A) 6-15 
REI instruction with, (III-A) 6-15 
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System call entry (entSys) register, (II-B) 1-3, (II-B) 
5-4, (II-B) 5-9 


System control block (SCB) 
arithmetic trap entry points, (II-A) 6-27 
fault entry points, (I-A) 6-27 
finding PFN, (II-A) 5-19 
saved on stack frame, (II-A) 6—7 
structure of, (II-A) 6-26 
with memory management faults, (I-A) 3-13 
System control block base (SCBB) register, (II-A) 
5-19 
specifies PFN, (II-A) 6-26 
System correctable errors, (II-C) 4-16 
reporting, (II-C) 4-18 
System crash, requesting, (III) 3-31 


System cycle counter (SCC) register 
at processor initialization, (III) 3-20 
reading, (I-A) 2-17 
System entry addresses, (II-B) 5—4 
System initialization, (III) 3-3 


System restarts, (III) 3-27 


error halt and recovery, (III) 3—30 

forcing console I/O mode, (IID) 3-36 

powerfail and recovery (multiprocessor), (IIT) 
9 


powerfail and recovery (split), (II) 3-29 

powerfail and recovery (uniprocessor), (III) 
3-28 : 

powerfail and recovery (united), (III) 3—29 

primary switching, (III) 3-31 

requesting acrash, (III) 3-31 

RESTORE_TERM routine, (II) 3-34, (ID 
3-35 

restoring terminal state, (III) 3-33 

SAVE_TERM routine, (IID 3—34, (IID 3-35 

saving terminal state, (III) 3-33 


System serial number, HWRPB field for, (III) 2-7 
System service call exceptions, (II-C) 44 
returning from, (II-C) 5-25 
System service exception address 
(SYSCALL_ENTRY) register, (II-C) 2-6 
System uncorrectable errors, (II-C) 4-17 
System value (sysvalue) register, (II-B) 1-4 
with PALcode switching, (III) 3-8 
System variation field (HWRPB) 
bit summary, (III) 2—13 
System, HWRPB field for 
revision code, (III) 2—7, (ID 2-11 
serial number, (III) 2—11 
type, (IID 2-7, (I) 2-13 . 
variation, (IID) 2—7, (IID 2-13 
Sysvalue. See System value 
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T 


T_floating data type 
alignment of, (I) 2-9 
exceptions, (I) 2—9 
format, (I) 2-9 
MAX/MIN, (I) 4-65 
NaN with S_floating convert, (1) 4-88 
when data is unaligned, (II-A) 6-28 


Tape. See Magtape 
TB hint offset, HWRPB field for, (III) 2-8 
TB. See Translation buffer 
TBB. See Translation buffer hint block 
tbi (PALcode) instruction, (II-B) 2—24 
with TBs, (II-B) 3—10 
tbia (PALcode) instruction, (I-C) 3-6, (II-C) 5-36 
tbim (PALcode) instruction, (II-C) 3—6, (II-C) 5-37 
tbimasn (PALcode) instruction, (II-C) 3-6, (II-C) 
5-38 
tbis (PALcode) instruction, (II-C) 3-6, (II-C) 5-39 
tbisasn (PALcode) instruction, (II-C) 3-6, (II-C) 
5—40 
Temporary PALcode registers, (I-C) 5-1 





Terminal console 
setting controls, (III) 2—43 
Terminals 
setting interrupts for, (II) 2-44 
TEST(x,cond) operator, (I) 3-10. 
TESTED_PAGES, memory cluster field, (IID 3-13 


Thread environment block base (TEB) register, 
(II-C) 2-6 

initializing, (II-C) 5-12 

returning contents of, (II-C) 5-53 

with context switch, (II-C) 2-10, (II-C) 5-31 
Thread unique value (THREAD) register, (II-C) 2—7 

initializing, (II-C) 5-12 

returning contents of, (II-C) 5-22 

with context switch, (II-C) 2-10, (II-C) 5-31 
Timeliness of location access, (I) 5-17 
Timer support, HAL interface fpr, (II-C) 1-3 
Timing considerations, atomic sequences, A-18 


Translation 
physical, (II-B) 3-7 
virtual, (II-B) 3-9 

Translation buffer (TB), (II-B) 3—10 
address space number with, (I-A) 3-11 
at context switch, (II-C) 2-10 
fault on execute, (II-A) 6-12 
fault on read, (I-A) 6-11 
fault on write, (I-A) 6-11 
granularity hint in PTE, (I-A) 3-5 
invalidate all, (II-C) 5-36 
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invalidate multiple, (II-C) 5-37 

invalidate single, (IJ-C) 5-39 

invalidate single data, (II-C) 5-8 

management of, (II-C) 3-5 

recursion in, (II-C) 3-6 

with invalid PTEs, (II-A) 3-12 
Translation buffer check (TBCHK) register 

described, (II-A) 5—23 

with translation buffer, (II-A) 3-12 
Translation buffer hint block (TBB), (IID 2-10, (ID) 

2-14 

Translation buffer invalidate all (TBIA) register 

described, (II-A) 5-24 

with translation buffer, (III-A) 3-12 
Translation buffer invalidate all process (TBIAP) 

register 

described, (II-A) 5—25 

with translation buffer, (II-A) 3—12 
Translation buffer invalidate single (TBIS) register, 

(I-A) 5—26 

Translation not valid fault, (II-A) 6-10, (II-B) 3-12 

service routine entry point, (II-A) 6-27 
Translation not valid fault Faults, (II-C) 4-3 


Trap disable bits, (I) 4—78 
denormal operand exception, (I) 4-81 
division by zero, (I) 4-81 
DZED with DZE arithmetic trap, (I) 4-77 
DZED with INV arithmetic trap, (1) 4-76 
IEEE compliance and, B—4 
inexact result, (1) 4-80 
invalid operation, (I) 4-81 
overflow disable, (I) 4~81 
underflow, (I) 4-80 
underflow to zero, (1) 4-80 
when unimplemented, (I) 4—79 
Trap enable bits, B—5 


TT SNN A 


Trap frames and offsets, (i-C) 4-3 
Trap handler, with non-finite arithmetic operands, 
(1) 4-74 
Trap handling, [EEE floating-point, B-—6 
Trap modes 
floating-point, (I) 4-69 
Trap shadow, (II-B) 5-2 


defined for floating-point, (I) 4-64 
programming implications for, (1) 5—30 


TRAP_CAUSE_ UNKNOWN code, (II-C) 4-11 


TRAPB (trap barrier) instruction, A-—15 


described, (I) 4-146 
with FPCR, (I) 4-84 


Traps. See Arithmetic traps 


TrFir trap frame offset 
from ExceptionPC address, (II-C) 4-5 
Trigger instruction, (II-B) 5-2 


Master Index—28 


True result, (I) 4-64 
True zero, (I) 4-65 


TTY_DEV environment variable, (III) 2—29 
with CTB, (III) 2—73 
TX BUFFER, field in RXTX buffer area, (III) 2-77 


TXLEN, field in RXTX buffer area, (III) 2—77 
TXRDY bitmask, HWRPB field for, (III) 2-10 


TXRDY flag, (IID 2-75 
at multiprocessor boot, (III) 3—23 


U 


UMULH instruction, (1) 4-36 
with MULQ, (I) 4-35 
Unaligned access exceptions, (II-C) 4-7 





Unaligned access fault 
system entry for, (II-B) 5—4 
UNALIGNED data objects, (1) 1-8 
Unaligned fault entry (entUna) register, (II-B) 1-3, 
(II-B) 5~—9 
Unconditional long jump, (I) 4-23 


' UNDEFINED operations, (I) 1-7 


Underflow bit, exception summary register, (II-C) 
4—6 

Underflow enable (UNFE) 

FP_C quadword bit, B-6 
Underflow status (UNFS) 

FP_C quadword bit, B—5 
Underflow trap, (II-A) 6-14, (II-B) 5-5, (II-C) 4-6 
UNDZ bit. See Trap disable bits, underflow to zero 
UNF bit 

A 


exception summary parameter, (I-A) 6-13 
exception summary register, (II-B) 5—5, (II-C) 
4-6 
See also Arithmetic traps, underflow 
UNED bit. See Trap disable bits, underflow 
Unique 
process unique value, (II-B) 1-4 
See also Processor unique value 


UNOP code form, A-13 
UNORDERED memory references, (I) 5-10 
Unpack to bytes instructions, (I) 4-159 


UNPKBL (Unpack bytes to longwords) instruction, 
(I) 4-159 
UNPKBW (Unpack bytes to words) instruction, (i) 
4-159 
UNPREDICTABLE results, (I) 1-7 
Updated datum, (I) 5-6 


User read enable (URE) 
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bit in PTE, (II-A) 3-4, (II-B) 3-4 
User stack, (II-C) 2-9 


User stack pointer (USP) register, (I-A) 5-27 
defined, (II-B) 1-4 
in HWPCB, (II-A) 4-2 
in initial HWPCB, (III) 3-21 
in process context, (II-B) 4—1 
internal processor register, (II-A) 5-1 


User write enable. (UWE) 
bitin PTE, (II-A) 3-4, (II-B) 3-4 
USER_BREAKPOINT breakpoint type, (II-C) 4-9 


USP. See User stack pointer 


V 


Valid (V) 

bit in PTE, (II-A) 3-6, (II-B) 3-5, (II-C) 3-5 
Validation, HWRPB field for, (III) 2-6 
vaSize, (II-B) 1-2 


VAX compatibility instructions, restrictions for, (I) 
4-152 
VAX compatibility register, (I) 3-3 
VAX floating-point 
D_floating, (I) 2-5 
F_floating, (1) 2-3 
G_floating, (I) 2-4 
See also Floating-point instructions 
VAX floating-point instructions 
add instructions, (I) 4-110 
compare instructionsCMPGEQ instruction, (I) 
4-112 
convert from integer instructions, (I) 4-115 


convert to integer instructions, (I) 4-114 
convert VAX floating format instructions, (1) 


divide instructions, (I) 4-121 
from integer move, (I) 4—125 
function codes for, C-—7 

function field format, (1) 4—87 
multiply instructions, (1) 4-127 
operate instructions, (I) 4-102 
square root instructions, (I) 4-129 
subtract instructions, (1) 4-131 


VAX rounding modes, (I) 4-66 


Vector instructions 


byte and word maximum, (I) 4-155 
byte and word minimum, (I) 4-155 


Virtual address format, (II-A) 3-2 
Virtual address space, (IJ-A) 3-1, (II-A) 3-2, (II-B) 
3-1, (II-C) 3-1 
minimum and maximum, (II-A) 3-2 
page size with, (II-A) 3-2 
Virtual address translation, (II-A) 3—10, (II-B) 3-9, 
(II-C) 3-3 
Virtual addresses 
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format of, (II-C) 3-2 
non-canonical at fault, (II-C) 4-7 
physical view of, (II-C) 3-3 
virtual view of, (II-C) 3-3 


Virtual cache blocks 
invalidating all, (II-C) 5-36 
invalidating multiple, (II-C) 5-37 
invalidating single, (II-C) 5-39 
Virtual D-cache, (I) 5-4 
Virtual format, (II-B) 3-2 
Virtual I-cache, (I) 5-4 
maintaining coherency of, (I) 5-5 
Virtual machine monitor (VMM), bit in PS register, 
(II-A) 6-6 
Virtual memory regions, initial, (III) 3-16 
Virtual page table base (VPTB) 


HWRPB field for, (IID) 2-8 
with PALcode switching, (III) 3-7 


Virtual page table base (VPTB) register, (II-A) 5-28 


Virtual page table pointer (VPTPTR), (II-B) 1-4 
with address translation, (II-B) 3—9 

Visibility, defined, (I) 5-14 

VPTB. See Virtual page table base 

VPTPTR. See Virtual page table pointer 


W 


Waivers, E-1 
Warm bootstrapping, (III) 3-22 


Watchpoints 


with fault on read, (II-A) 6-11 
with fault on write, (I-A) 6-11 


WH64 (Write hint) instruction, (1) 4-147 
WH64 instruction 

lock_flag with, (I) 4-10 
whami (PALcode) instruction, (II-B) 2—25 


whami, current processor number, (II-B) 1-4 








Who-Am-I (WHAMI) register 


at processor initialization, (III) 3-20 
with PALcode switching, (III) 3-8 


Who-Am-I (WHAMI) register Processor number, 
reading, (II-A) 5-29 
Windows NT Alpha PALcode, instruction summary, 
C-17 ; 
WMB (Write memory barrier) instruction, (I) 4-149 


atomic operations with, (I) 5-8 
compared with MB, (1) 4-150 
with shared data structures, (I) 5-9 


Word data type, (I) 2-1 
atomic access of, (I) 5-3 
WR_PS_SW (PALcode) instruction, (II-A) 2—20 
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wrent (PALcode) instruction, (II-B) 2—26 ZEXT(x)operator, (1) 3-10 
wrentry (PALcode) instruction, (II-C) 5-41 
at initialization, (II-C) 6-2 
writes GENERAL_ENTRY register, (II-C) 2-4 
writes INTERRUPT_ENTRY register, (II-C) 
24 
writes MEM_MGMT_ENTRY register, (II-C) . 
2-5 
writes PANIC_ENTRY register, (II-C) 2-5 
writes SYSCALL_ENTRY register, (JI-C) 2-6 


wrfen (PALcode) instruction, (II-B) 2—27 
wripir (PALcode) instruction, (II-B) 2-28 
Write buffers, requirements for, (I) 5—5 


WRITE device routine, (III) 2-55 
characteristics determined by OPEN, (III) 2-56 
WRITE_UNQ (PALcode) instruction, (II-A) 2-81 


Write-back caches, requirements for, (I) 5-5 
wrkgp (PALcode) instruction, (II-B) 2—29 
wrmces (PALcode) instruction, (II-B) 2—30, (II-C) 


5-43 

wrperfmon (PALcode) instruction, (II-B) 2-31, 
(II-C) 5-44 

wrunique (PALcode) instruction, (II-B) 2-8, (II-B) 
2-9 


required recognition of, (I) 6-4 
wrusp (PALcode) instruction, (II-B).2—32 
wrval (PALcode) instruction, (II-B) 2—33 
wrvptptr (PALcode) instruction, (II-B) 2-34 
WTINT (PALcode) instruction, (II-A) 2-94 
wtint (PALcode) instruction, (II-B) 2-35 


X 


OD y operator, (I) 3-8 


X 
WV flnanting Anta ternn /™\9_Q 
4s_AMlUalilly Udla Lype, LU) 2-7 


alignment of, (I) 2-10 
big-endian format, (I) 2-10 
MAX/MIN, (1) 4-65 


XOR instruction, (I) 4-42 
XOR operator, (I) 3-10 


Y 


YUV coordinates, interleaved, (I) 4-154 


Z 


ZAP instruction, (I) 4-61 
ZAPNOT instruction, (I) 4—61 
Zero byte instructions, (I) 4-61 
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Alpha Instruction Set from the Common Architecture (I) 


ADDF 4-110 
ADDG 4—110 
ADDL 4—25 
ADDQ 4-27 
ADDS 4-111 
ADDT 4-111 
AMASK 4—134 
AND 442 


BEQ 4—20 
BGE 4—20 
BGT 4-20 
BIC 442 
BIS 442 
BLBC 4—20 
BLBS 4—20 
BLE 4—20 
BLT 4-20 
BNE 4—20 
BR 4-21 
BSR 4-21 


CALL_PAL 4-136 
CMOVEQ 4—43 
CMOVGE 4—43 
CMOVGT 4—43 
CMOVLBC 4—43 
CMOVLBS 4-43 
CMOVLE 4—43 
CMOVNE 4—43 
CMPBGE 4—49 
CMPEQ 4—29 
CMPGEQ 4-112 
CMPGLE 4—112 
CMPGLT 4—112 
CMPLE 4—29 
CMPLT 4—29 
CMPTEQ 4-113 
CMPTLE 4—113 
CMPTLT 4—113 
CMPTUN 4-113 
CMPULE 4—30 
CMPULT 4—30 
CPYS 4-105 
CPYSE 4-105 
CPYSN 4-105 
CTLZ 4—31 
CTPOP 4—32 
CTTZ 4-33 
CVTDG 4-116 
CVTGD 4-116 
CVTGF 4-116 
CVTGQ 4-114 


CVTLQ 4—106 
CVTQF 4-115 
CVTQG 4-115 
CVTQL 4—106 
CVTOQS 4-118 
CVTQT 4—118 
CVTST 4-119 
CVTTQ 4-117 
CVTTS 4-120 


DIVF 4-121 
DIVG 4-121 
DIVS 4-122 
DIVT 4-122 


ECB 4—137 
EQV 4-42 
EXCB 4—139 
EXTBL 4—51 
EXTLH 4—51 
EXTLL 4-51 
EXTQH 4—51 
EXTQL 4-51 
EXTWH 4-51 
EXTWL 4—51 


FBEQ 4—100 
FBGE 4-100 
FBGT 4-100 
FBLE 4—100 
FBLT 4—100 
FBNE 4—100 
FCMOVEQ 4—107 
FCMOVGE 4—107 
FCMOVGT 4—107 
FCMOVLE 4—107 
FCMOVLT 4-107 
FCMOVNE 4—107 
FETCH 4—140 
FETCH_M 4-140 
FTOIS 4—123 
FTOIT 4—123 


IMPLVER 4—142 
INSBL 4—55 
INSLH 4—55 
INSLL 4—55 
INSQH 4—55 


INSQL 4—55 


INSWH 4—55 
INSWL 4—55 
ITOFF 4—125 
ITOFS 4—125 
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ITOFT 4—125 


IMP 4—22 
JSR 4—22 
JSW_COROUTINE 4—22 


LD_L 4~9 
LDA 4—5 
LDAH 4—5 
LDBU 4—6 
LDF 4—91 
LDG 4-92 
LDL 4-6 
LDQ 4-6 
LDQ_L 4-9 
LDQ_U 4-8 
LDS 4-93 
LDT 4—94 
LDWU 4-6 


MAXSB8 4—155 
MAXSW4 4-155 
MAXUB8 4—155 
MAXUW4 4—155 
MB 4—143 
MF_FPCR 4—109 
MINSB8 4-155 
MINSW4 4—155 
MINUB8 4—155 
MINUW4 4—155 
MSKBL 4—57 
MSKLH 4—57 
MSKLL 4—57 
MSKQH 4—57 
MSKQL 4—57 
MSKWH 4—57 
MSKWL 4—57 
MT_FPCR 4—109 
MULF 4-127 
MULG 4—127 
MULL 4—34 
MULQ 4—35 
MULS 4-128 
MULT 4-128 


ORNOT 4—42 
PERR 4—157 
PKLB 4-158 
PKLW 4-158 


RC 4-153 


_ RET 4-22 
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Alpha Instruction Set from the Common Architecture (I) 


RPCC 4—144 
RS 4-153 


S4ADDL 4~—26 
S4ADDQ 4-28 
S4SUBL 4-38 
S4SUBQ 4—40 
S8ADDL 4—26 
S8ADDQ 4—28 
S8SUBL 4-38 
S8SUBQ 440 
SEXTB 4—60 
SEXTW 4-60 
SLL 445 
SQRTF 4—129 
SQRTG 4—129 
SQRTS 4-130 
SQRTT 4—130 


SRA 446 
SRL 445 
STB 4—15 
STF 4-95 


STG 4-96 _ 


STL 4-15 
STL_C 4-12 
STQ 4-15 
STQ_C 4-12 
STQ_U 4-17 
STS 4-97 
STT 4-98 
STW 4-15 
SUBF 4-131 
SUBG 4-131 
SUBL 4—37 
SUBQ 4-39 
SUBS 4-132 


SUBT 4-132 
TRAPB 4-146 


UMULH 4—36 
UNPKBL 4-159 
UNPKBW 4-159 


WB 4-149 
WH64 4-147 


XOR 4—42 


ZAP 4—61 
ZAPNOT 4-61 
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OpenVMS Alpha (Section II-A) PALcode Instruction Index 


AMOVRM 2-75 
AMOVRR 2-75 


BPT 2-4 
BUGCHK 2-5 


CFLUSH 2-83 
’ CHME 2-6 
CHMK 2-7 
CHMS 2-8 
CHMU 2-9 
CLRFEN 2-10 
CSERVE 2-84 


GENTRAP 2-11 


INSQHIL 2—30 
INSQHILR 2-32 
INSQHIQ 2-34 


INSQHIQR 2—36 
INSQTIL 2-38 
INSQTILR 2—40 
INSQTIQ 2—42 
INSQTIOR 2—44 
INSQUEL 2—46 
INSQUEQ 2-48 


LDQP 2-85 . 


MFPR_IPR_name 2—86 
MTPR_IPR_name 2—87 


PROBE 2-12 


RD_PS 2-14 
READ_UNQ 2-80 
REI 2—15 
REMQHIL 2—50 
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REMQHILR 2-53 
REMQHIQ 2-55 
REMQHIQR 2-58 
REMOQTIL 2-60 
REMOTILR 2-63 
REMQTIQ 2-65 
REMOTIQR 2-68 
REMQUEL 2-70 
REMQUEQ 2-72 
RSCC 2-17 


STQP 2-88 
SWASTEN 2-19 
SWPPAL 2~—92 


WR_PS_SW 2-20 
WRITE_UNQ 2-81 
WTINT 2-94 
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DIGITAL UNIX (Section II-B) PALcode Instruction Index 


bpt 2—2 
bugchk 2-3 


callsys 2—4 
cflush 2—11 
clrfen 2—5 
cserve 2-12 


gentrap 2-6 
rdmces 2—13 


rdps 2-14 
rdunique 2—7 


rdusp 2—15 
rdval 2-16 
retsys 2—17 
rti 2-18 


swpctx 2-19 
swpipl 2—21 
swppal 2—22 
tbi 2-24 


urti 2-8 
whami 2—25 
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wrent 2—26 
wrfen 2—27 
wripir 2-28 
wrkgp 2-29 
wrmces 2—30 
wrperfmon 2—31 
wrunique 2—9 
wrusp 2—32 
wrval 2—33 
wrvptptr 2-34 
wtint 2-35 
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Windows NT Alpha (Section II-C) PALcode Instruction Index 


bpt 5—46 


callkd 5-47 
callsys 548 
csir 5—4 


dalnfix 5—5 
di 5-6 
draina 5—7 
dtbis 5-8 


ealnfix 5—9 
ei 5—10 


gentrap 5—50 


halt 5-11 
imb 5—51 


initpal 5-12 
initpcr 5—14 


kbpt 5—52 


rdcounters 5—15 


rdirql 5—16 
rdksp 5-17 
rdmces 5—18 
rdpcr 5-19 
rdpsr 5—20 
rdstate 5—21 
rdteb 5—53 
rdthread 5—22 
reboot 5—23 
restart 5-24 
retsys 5—25 
rfe 5—27 
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ssir 5-29 

swpctx 5—30 
swpirql 5-32 
swpksp 5—33 
swppal 5-34 
swpprocess 5—35 


tbia 5—36 
tbim 5—37 
tbimasn 5-38 
tbis 5—39 
tbisasn 5—40 


wrentry 5—41 
wrmces 5—43 
wrperfmon 5—44 


Windows NT Alpha (Section II-C) PALcode Instruction Index 


